Coldfusion 9 read xml file
Thanks for the tip. Some great ideas. I actually tried to break the doc up initially and found something stupid - after row 5,, ish the file was full of [spaces] - so the "line" that bot.
Once I split the file up and opened up some of the smaller files in a text editor it was easy to spot. I'm quite happy with how the cfloop performed on a large file - was pretty skeptical, but it made it through about 5GB of data without a hiccup. Crania, Do you have any examples of this.
The XML feed that I am being given does not contain any line breaks and so it does not seem to want to loop around it correctly. Crania - that's great but it's not actually handling any of the data just spitting out the individual lines. Both CF8 and Railo can do this this very efficiently like this:. If I can read one xml "node" at a time, it would be nice to find a way to paginate back and forth between nodes.
I was wondering about finding a way to "index" a large XML file and then retrieve a specific node. I suppose you could keep a prevNode reference after every parsing; it might be simple, or it might get complicated depending on your needs.
Thanks for pointing out those file-based methods, though. I am not sure that I have used those ones before. Your Java regex search pattern is obviously the way to go Ben - very smart indeed. CF8 definitely had some sweet updates. That kind of file looping can now, also being done using the CFLoop tag. I assume these functions were simply the script-based equivalent to what the CFLoop tag is doing.
Good to know them. It's nice to finally use of those functions; although, I have to say that I think the tag-based equivalents are a little bit easier to use. Simply genial! It works, but Windows dipendent and extremely difficult to manage. In the previous scenario every time I had to apply a modify, I had to open the vb project, recompile the dll, deinstall old dll, install the new dll and reload coldfusion! Yeah, I've been told that using Java or some compiled DLL is going to be faster; but, I am glad that you are finding this to be more management as it is in the native language!
CFML was looking at the following text I tried adding to the DB during the loop like follows Seems like it's not getting enough time to write to DB before the loop "move on". Is that possible? I was able to parse a node XML file, build and populate a db schema in under 10 seconds using your method.
Let me add my voice to the chorus of people who found this solution immensely helpful. I lifted your regular expression from the cfc and applied the Java Pattern Matcher technique over a number of iterations to break large XML files into database-sized chunks.
I'm using the following expressions for the Matcher:. This works great for pulling most of the list items and their associated headers and commentary, except in one recursive case where list items contain other lists that also have items.
The first expression doesn't see the list node at all in this case. An abbreviated version of the offending section of XML follows. When I parse it using the first Pattern, only the head node is returned by the Matcher. Can you suggest a way to get the Matcher to find the list and its items?
I'm OK with not parsing the inner lists and just returning the outer list items, but the recursive structure makes the outer list unmatchable. Ultimately I'm planning to put these into a simple database table that, beyond primary and foreign keys, just a varchar field to hold the text content of head, p, and item tags, an increment to keep lists separate, a line number to keep the list order, and a bit field that tells me if I'm in a sub-list and need to indent deeper.
I also had problems with parsing large XML files. It can parse files up to 35GB and it's really fast. It's really easy to use. You set up parsing rules on a website and test your file online and then use their client code to access your parsing rules from java or javascript.
They have a free developer version at www. Hi Ben, I find your blog stimulating and informative. It seems most of the time when I am looking for an idea on how approach a particular problem --you have already tackled it. I am working on parsing a rather large XML document so I naturally found this page. I have one small wrinkle to add to the problem. In my case the XML may contain sub nodes of the same type. When I try parsing with your code here, I bomb out when I hit the embedded value tag.
The code tries to pass the partial node to XmlParse which complains that the start and end tags don't match. I really don't know why I'm having trouble this seems so easy. Things still don't work. Here's the XML page: books. Nice article. However, it seems we both seem to have demonstrated on the same subject with a similar example. This is my version. Anuj, If you are trying to imply that I got the idea to write this article from reading yours, then I assure you that is not the case. I have never seen your article before today.
Beyond the use of a book related XML structure and the general topic of using xPath expressions, our articles differ quite a bit in the intended audience. Mine is geared as more of an introduction to xPath expressions and how to use them to parse the information you need from an XML document and it hints at the end that there is much more you can do with xPath expressions.
Your article seems to be geared more toward developers who already know a little bit about xPath expressions as it doesn't really explain what xPath expressions are and you expect your audience to be able to figure it out for themselves by reading the code. Your article shows more advanced ways to use xPath, which I must admit I found quite interesting because I had never thought of using the count or sum functions the way you have demonstrated there. Pretty neat stuff.
Scott, You got me wrong. I was not trying to say that your article had aanything to do with mine, all I was saying is we ended up writing on a similar subject. And we are all sharing knowledge so why bother anyways. Ok that's cool, sorry for the misunderstanding.
Have you used the lower-case along with contains to make the contains not case sensitive? When i try it it says the lower-case is not found. Is this not available in the java behind xpath searches in cf? Sign in. Quick links View all your apps Manage your plans. Stores ColdFusion Administrator settings for memory variables, CFX tags, Custom tag paths, Mappings, Corba connectors, sitewide error handler, missing template handler, Enable http status codes, Time out requests, Log Slow Requests, Caching, default mail charset, default charset.
Causes ColdFusion to watch its configuration files and automatically reload them if they change. Leigh Leigh Thanks Leigh, this worked well. It's not as fast as I'd hoped but I am testing on my local computer which slows it down a bit I bet. It's taking about 5mins. I think I will attempt to create a single query maybe and see if that speeds things up. My XML file will probably have 10s of 1,s of records. Well, looping is time consuming any way you slice it. Also, databases have limits on how much sql they can process at one time.
So you may not be able to do everything in one 1 query. But breaking it into chunks, to reduce the number of database hits, should help. Also does your database support bulk loading of xml instead? MS SQL does. Its bulk loading tool is a COM object. But it is pretty fast. I am also wondering if it would be better to transform the data into smaller flat files. As most databases have tools for bulk importing text files.
0コメント