Extract attribute data from an XMLElement

I’m trying to come to grips with parsing an XML file but I’m having difficulty working out how to extract specific attribute data from an XMLElement. Sample XML file below:




This data file created by Design-Expert 6















As you can see, this particular XML file relies on attributes to hold much of the data. I have no trouble selecting entire XMLElement(s) by name using Cases but I can’t seem to work out how to extract specific attribute data by name. For example, I’d like to be able to extract the studyType attribute from the designInfo element.

This is what the XMLElement that I’m working with looks like:

XMLObject[
“Document”][{XMLObject[“Doctype”][“DEXXMLDoc”,
“System” -> “DEXXMLDoc.DTD”]},
XMLElement[
“DEXXMLDoc”, {“name” -> “RSM-a.xml”, “creator” -> “DX8”,
“version” -> “8.0.6”}, {XMLElement[
“designInfo”, {“studyType” -> “ResponseSurface”,
“designType” -> “CCD”, “noOfRuns” -> “20”, “noOfFactors” -> “3”,
“noOfResponses” -> “2”}, {XMLElement[
“designNotes”, {}, {“This data file created by Design-Expert \
6”}]}], XMLElement[
“blockInfo”, {}, {XMLElement[
“block”, {“code” -> “0”, “name” -> “Block 1”}, {}],
XMLElement[“block”, {“code” -> “1”, “name” -> “Block 2”}, {}]}],

Does anyone have experience in extracting named attribute data from XMLElements?

=================

  

 

Did you get the XMLObject by using something like ImportString[Import[xmlFileName], “XML”]? Including that may help our users.
– Jacob Akkerboom
Jul 23 ’13 at 10:29

  

 

I just used the plain Import function like so: DX8 = Import[“RSM-a.xml”, “XML”, “ReadDTD” -> False]
– CrustyNoodle
Jul 29 ’13 at 5:13

=================

1 Answer
1

=================

I can’t try it out because your snippet lacks some brackets, but perhaps this will do the trick:

Cases[snippet,
XMLElement[“designInfo”,
{“studyType” -> atrib_, ___}, ___] :> atrib, \[Infinity]]

  

 

Thanks, that works perfectly! Easy when you know how eh! Sorry for not posting the entire list – just trying to save space.
– CrustyNoodle
Mar 26 ’13 at 12:22

5

 

That presumes that “studyType” will always be first, and while it is likely to follow the order in the XML, there is no guarantee it will be first. I recommend using {___,”studyType” -> atrib_, ___}, instead. Or, reducing it to simply HoldPattern[“studyType” -> atrib_] :> atrib, instead of XMLElement[…], as Cases will search the entire document with the level spec set to Infinity.
– rcollyer
Mar 26 ’13 at 12:48

  

 

You are correct. The order is not guaranteed to be consistent. I’ll try the HoldPattern[…] suggestion as recommended. Thanks for your help.
– CrustyNoodle
Mar 26 ’13 at 22:18

  

 

What if the element itself is a XMLElement? Eg “XMLElement[“hoomd_xml”, {“version” -> “1.4”}, {XMLElement[ “configuration”, {“time_step” -> “0”, “natoms” -> “4806”}, {XMLElement[“type”, {“num” -> “4806”}, {“0”}]}
– PFD
Apr 30 ’15 at 22:08