how to make a recursive function to analyse this file

I am stuck at this problem beginning

We assume we have keyword k, and the structure in the file:

file.txt

struct test12{
struct myBlock1{
struct theColor1{
}
struct notText{
struct Light{
}
struct super{
}
}
}
}

Can someone point me in how to make a recursive function who has as input the file.txt and outputs the whole nested tree of nested element ?
Obs: the file can have an infinite nr of levels in theory (we assume around 100 max)

=================

  

 

Somthing like kName1[chrs, kName2[chrs, kName3[]]]?
– Dr. belisarius
Dec 1 ’14 at 18:40

  

 

there is at least one space or new line or tab between k and name
– TraceKira
Dec 1 ’14 at 19:15

  

 

There are no in k name3. So are the parts actually optional?
– hengxin
Dec 2 ’14 at 7:44

  

 

I will remove the random characters to make it easier to implem
– TraceKira
Dec 2 ’14 at 15:11

2

 

It would increase your chance for an answer if you gave a very precise description of the file format and some actual example data with no … in it. For example, do you require handling “comments” (the // things)? Parsing is not trivial in Mathematica. If the file format is sufficiently complex, I’d use external parsing tools such as Boost Spirit, through LibraryLink.
– Szabolcs
Dec 2 ’14 at 17:08

=================

1 Answer
1

=================

Let me give you at least a start, because in general, I like this question. There are some things that are still unclear. For instance, you write in the comment

and has as child name 2 who has two childs name 3 and 4 . etc

Regarding the example you gave, this is not correct. name2 has one child which is name3 which in return has one child name4. Look at how you placed the braces. If you want to make name2 have 2 children it should look like

k name2{
k name3{
}//end name 3
k name4{
}//end name 4
}//end name 2

Furthermore, if the structure of you file is really consistent, then you could probably use a much simpler approach the using a recursive parsing function. Let’s assume you have a file.txt which contains (I left out all the comments)

k name1{
k name2{
k name3{
}
k name4{
k name5{
}
k name6{
}
}
}
}

then you can write a routing that does what you probably want. Let us define two string patterns that match the beginning of a struct and the end respectively:

$start = WhitespaceCharacter…~~”k”~~” “~~name__~~”{“~~WhitespaceCharacter… :> name;
$end = WhitespaceCharacter…~~”}”~~WhitespaceCharacter…;

If you now try to parse one element, you need to extract the name (as in name3), you need to parse possible other elements that are inside and you need to parse the closing brace. And, an element always starts when the line matches $start

parseElement[] /; lineMatches[First[$start]] :=
With[{head = ToExpression[StringReplace[nextLine[], $start]]},
head[parseArgs[], parseEnd[]]
];
parseElement[___] := ParseErrorAt[parseElement, nextLine[]];

The implementation of the functions lineMatches and nextLine is simple and I will show it at the end. OK, you see that every parseElement creates a new nameX[…] construct. Since we already have the closing bracket for this, parseEnd just needs to check whether we have indeed a closing } and then it should simply vanish. This is realized with Sequence[]. Note that we throw away the current } line with nextLine so that we can go on with the parsing process:

parseEnd[] /; lineMatches[$end] := (nextLine[];
Unevaluated[Sequence[]]);
parseEnd[___] := ParseErrorAt[parseEnd, nextLine[]];

Parsing the arguments is simple. If you meet the start of a new element, we parse the element and then go on parsing possible other new elements that could follow. If we don’t meet the start of a new construct, and in fact meet the closing of an already opened struct, we simply do nothing:

parseArgs[] /; lineMatches[First[$start]] :=
With[{e = parseElement[], r = parseArgs[]},
Unevaluated[Sequence[e, r]]
];
parseArgs[] /; lineMatches[$end] := Sequence[];
parseArgs[___] := ParseErrorAt[parseArgs, nextLine[]];

What is left is the helper functions, that provide us with the next line and make it possible to check patterns on the next line before throwing it away:

Module[{data = {}},
open[file_] := data = Import[file, {“Text”, “Lines”}];
lineMatches[pattern_] :=
StringMatchQ[First[data], pattern] /; data =!= {};
lineMatches[___] := False;

nextLine[] := Module[{result},
If[data === {},
result = EndOfBuffer,
result = First[data];
data = Rest[data];
];
result
];
]

Now, everything should be set up to test it:

open[“http://pastebin.com/raw.php?i=QQexfVm4”];
parseElement[]

(* name1[name2[name3[], name4[name5[], name6[]]]] *)

  

 

just a comment “k” can be any string like “animal”, “plant” etc but i think we can consider k=”k” for the moment
– TraceKira
Dec 3 ’14 at 11:48

  

 

@timoftebogdan Since you always used the same k, I thought this is constant, but it is no problem to add it to the pattern and include it in the final tree.
– halirutan
Dec 3 ’14 at 15:02

  

 

yes i think so…-_- ”
– TraceKira
Dec 3 ’14 at 18:05

  

 

is this solution recursive ? I cant figure it lol 😀
– TraceKira
Dec 3 ’14 at 21:16

  

 

I updated the input text with a concrete example :). I tried your solution and it doesn’t work on the concrete example 🙁
– TraceKira
Dec 3 ’14 at 21:51