FLOSS Weekly 97/Transcript
|Hosts: Leo Laporte and Randal Schwartz|
Guest: Wolfgang Meier
Recorded: November 25, 2009
Published: November 30, 2009
TranscriptPrevious episode – Next episode
- 1 Transcript
- 2 Sponsorship info (omitted)
- 3 Randal's discussion of Ignite Portland (omitted)
- 4 Discussion of show content begins (6m:45s)
- 5 Leo's ad for Ford SYNC (omitted, 8:10)
- 6 Wolfgang Meier enters (12:07)
- 7 The Background and History of eXist-db (13:20)
- 8 Relational Databases vs. XML Databases (15:45)
- 9 Overview of XML databases vs. LAMP stack (25:00)
- 10 eXist as a "NoSQL" database (27:00)
- 11 Transaction support (31:35)
- 12 Security, roles (32:08)
- 13 Architecture of eXist (34:15)
- 14 Hosting and Deployment of eXist (34:48)
- 15 eXist-db Project Development platform (35:44)
- 16 Sites running eXist: U.S. State Department Office of the Historian website (36:08)
- 17 Competitor products (37:20)
- 18 How did eXist win over the State Department, advantages of eXist (38:25)
- 19 Standards, XML, TEI (41:00)
- 20 eXist's use of Lucene (41:45)
- 21 XQuery as a Server Language (42:50)
- 22 Applications Beyond the Humanities (43:20)
- 23 Randal's Overall Judgment About eXist (44:45)
- 24 Wolfgang's Call for Participation (45:40)
- 25 Thanks & Close of Interview with Wolfgang (46:35)
- 26 Post-Interview Discussion About Significance of eXist, XQuery, and history.state.gov (46:59)
- 27 Joking about how so many listeners have gone to history.state.gov that the response time slowed briefly (omitted)
- 28 Thanksgiving wishes, closing (51:45, omitted)
This transcript was prepared by members of the eXist-db user community
INTRO Netcasts you love, from people you trust. This is TWiT.
LEO LAPORTE Bandwidth for FLOSS Weekly is provided by Cachefly.
This is FLOSS Weekly, Episode 97 for November 26, 2009: eXist-db.
Sponsorship info (omitted)
It's time for FLOSS Weekly, the show the covers everything having to do with free, libre, and open source software, and here he is, the star of our show, lady's and gentlemen, Mr. Randal Schwartz. Hey, Randal!
RANDAL SCHWARTZ Hey, Leo! Good to talk to you again.
Randal's discussion of Ignite Portland (omitted)
Discussion of show content begins (6m:45s)
Leo Laporte Who do we have on the show today, Randal?
Randal Schwartz You know, Leo, we've had a few shows in the last month or two that have been - sort-of - anybody can understand what we are talking about because it's just a distro or a general technology or something. I decided just to mix up the pitches just a little bit: Yoday's show is a high-end technical, geeko, massive TLA, three-letter acronyms, mass-out show. We have brought on the Wolfgang Meier, a good German man who, many years ago actually, almost all the way back to 2000, created this thing called 'eXist-db.' Now, if that is not already geeky enough, eXist-db is an XML database which means that instead of having traditional rows and columns like you'd have in an SQL database, it actually stores the documents, as far as I can tell, stores the documents, in their native XML shape, which is more hierarchical than rectangular, and there is all sorts of XSLT involved and XQuery language and REST and SOAP and - I tell you, I think we're going to have more acronyms than we've had in a show in a long time, so if your eyes start glazing over, I'll know, it's no problem!
Leo Laporte No, I think you will all enjoy it, but before we get to that though... I want to give you one more acronym: SYNC - we should come up with what that means! Ah, "Say your next command" - how about that?
Leo's ad for Ford SYNC (omitted, 8:10)
Wolfgang Meier enters (12:07)
Randal Schwartz Alright, are you ready to go to Germany?
Leo Laporte Yeah, I think he's in Frankfurt. Wolfgang Meier on the line, are you in Frankfurt?
WOLFGANG MEIER Yeah, sure, close to Frankfurt, actually.
Leo Laporte Okay, that's probably all I'll be able to figure out, so...
Randal Schwartz How close to Frankfurt? Is there a name, just for the five people that probably live near you.
Wolfgang Meier Rüsselsheim.
Leo Laporte I've been to Darmstadt.
Wolfgang Meier Actually - ah, yeah, that's not too far away, just 20 kilometer or so.
Leo Laporte I've been to Darmstadt, so I know a little bit about that.
Wolfgang Meier Yeah, Rüsselsheim is quite famous right now because it is the headquarters of Opel, the company which General Motors tried to sell. Oh, well, they didn't sell it right now, but they were supposed to do it, so...
Leo Laporte Well, we have another connection because my very first car was an Opel Kadett!
Wolfgang Meier Ah, right, okay!
Leo Laporte I know how to fix, I can tune that car! I can fix that car, because I had to. I was an impoverished student.
Randal Schwartz So this whole show can be about things about Germany we didn't know we had connections with.
The Background and History of eXist-db (13:20)
Leo Laporte But let's not! Let's talk about eXist-db. Are you the chief developer on this, Wolfgang, or is there a large team working on it?
Wolfgang Meier I created the project, quite some time ago. Actually it's now nearly ten years. And I have been working on it for the few few years, more or less on my own. But then, well, new people started to contribute and I think that right now we have about, well we have a core team working together every day like, oh, five-six people, and then there are, I don't know. And then there are ten-twenty-thirty more people are contributing to certain areas - certain features or extensions, indexes, query features, whatever.
Randal Schwartz So what was the problem you were trying to solve at the beginning? What problem were you trying to solve? What was missing that had you start this?
Wolfgang Meier Oh, when I started the project in, around 2000, summer 2000, well, XML had already been released as a specification, but not much more. So in the beginning we just had XML documents. And I was working with SGML before that, and well, we had XML documents and we used at that time still, eh, that horrible stylesheet language called "DSSSL" which was a scheme dialect. But there wasn't much more, no way to query those documents, and I had lots of those documents, and I thought about how, how I could query them efficiently. And I ended up in my summer holiday reading a few papers about efficient indexing of XML, and it was all quite new at that time, and I decided to implement, some of the procedures and algorithms I found in those papers. And, yeah, that's how it started and over time, it evolved, and as more and more standards came into the XML family, so now it is a whole stack of standards, but it all grew over the years.
Relational Databases vs. XML Databases (15:45)
Randal Schwartz OK, so you started with something that was just really efficient with indexing because you read this paper on it. Now, one of the things that I am familiar with in terms of XML is that it sort-of doesn't shape well in traditional databases, because it's all about hierarchical data, rather than, you know, rows and columns. Now, is that one of the problems you were trying to solve initially with this indexing, that is, how do you get efficient indexing, because it is triangular rather than square?
Wolfgang Meier Yeah, well, the huge advantage of XML databases in general is that, well, you don't need to shred your XML document into relational tables anymore. But, what the database does, it takes the model of your document, basically the tree model behind it, and directly stores that into an efficient data structure, so you can query that immediately, without creating a mapping scheme or anything. You just put your data into the database and then start writing your queries. So it's very good for a rapid prototyping approach. You don't need all this mapping to and from tables. So that one advantage all XML databases have and that's why they are used. In eXist in particular, I was interested in how to address nodes within that document tree so I could efficiently access them at any time without loading the entire document back into memory, because that is what you don't want, if you have really large documents.
Randal Schwartz Well, you could address those with XPath. Did XPath exist at that time?
Wolfgang Meier Yeah, XPath already existed in a rough version, I think. But well, the problem just is, like, XPath, the core expressions in XPath are like: "Well, find me all title tags which are descendants of chapters" -- things like that. So basically, what eXist does, it applies a so-called numbering scheme to the document tree, and this numbering scheme allows you, if you have the number of the title, you can immediately tell if this number can actually be a descendant of any chapter, so it is just by comparing those numeric node document IDs that you can resolve those path expressions.
Randal Schwartz Right, so you start with an XML document, you can query it with XPath, and then you, then as data matured, as specifications matured, you added XQL to it, I understand, right?
Wolfgang Meier Ah yeah, later on there were those other attempts to create a query language, and we finally ended up with XQuery, which is a rather complete, functional language, I think, which means it is not just a query language like SQL is. We use XQuery pretty much also as a server-side web development language. I have written quite a few web sites, just using XQuery and things like XSLT. So it is more, it's not like just a query language, but it is also a replacement for things like PHP or JSP or whatever. The good thing is that you can use XML and related standards to create an entire web site and you don't need to bother with any other high-level languages.
Randal Schwartz So let me understand this then, so you can start with XML documents as your data source. You would use XQL to pull out pieces of that -- XQuery language -- and then you would also maybe process that with some XSLT perhaps -- or is that parallel to XQL? -- and then it would spit out entire web pages. The browser couldn't tell that you've got all this machinery behind the scenes. Is that a description?
Wolfgang Meier Yeah, sure, for example, our entire web site is powered like that. All the tools on the web site, even the tools to try out XQuery stuff or to look up the function documentation, whatever, it is all just XQuery running in the background, on the server, on the database, and that XQuery directly has access to the XML documents stored in the database and all the indexes and stuff.
Randal Schwartz Well, this seems a bit unusual, compared to traditional approaches, like PHP and MySQL. So, would someone coming from a PHP world be completely confused by all this? Is there a transition path, or is it actually pretty natural, once you start playing with it - of course you'll say yes to that - but tell me how I would get from the PHP world to where you are talking about.
Wolfgang Meier Oh, we have quite a lot of users who start with PHP or even Perl or whatever they learned first, and then they discover eXist or another XML database, and, well, they start programming to the database. We offer SOAP interfaces or REST-style interfaces, XML-RPC. eXist has 6-7 different network interfaces you can use it from. So people still continue to write PHP scripts until at some point, they start to create more and more complex XQueries, and finally discover, okay, basically I could just replace my entire PHP script that I had previously with my XQuery. So it is a step-by-step learning process, I think.
Randal Schwartz So this would be similar to taking, say, a traditional database with stored procedures like, say Postgres, and discovering that I can move a little more of my application into the Postgres stored procedures, rather than computing it in my view code, I can push that more and more into the data processing code, until eventually, my entire program is one line of PHP, one line of Perl, that say: Go, take this data and give it the URL content for it. Right?
Wolfgang Meier Yeah, exactly, a little bit like that, except that, well it is not just XQuery, but it is the entire XML stack you have available. You can combine it with XForms to create web-based forms, and it is very natural for XForms to interact with an XML database, because your form, it would just send an XML document back to the database which would store it or process it, with XQuery or whatever. What's also very natural, I think, for people coming from Perl or PHP is that eXist allows you, in the same way as SQL has stored procedures, it allows you to store those XQueries into the database, and then you can map them to an URI or whatever, so basically you end up with a RESTful architecture, where URI's do directly translate to queries running in the background, which access the database. And, from there, it is rather easy to create a modularized application.
Randal Schwartz Wow! So let us bring this back to something I can relate to again, so let's say you had a blog, and you wanted to manage your blog, using this system, I could store an XML doc that had "abstract, posting, subject, date, body, reference URL" or something, repeated over and over again in an XML doc, with one particular schema, and then generate some XQL pages that would extract this and provide the view for the typical blog visitor. But I could also have another URL published, that would directly execute XQL to go give me the RSS feed for that blog? Does that make sense?
Wolfgang Meier Yeah, sure, absolutely. If you look at our own blog or wiki, it works like that. It stores all the entries and feeds as Atom. Atom is another XML standard. It directly stores that stuff as a Atom feed and what you are editing on the wiki page is actually the Atom entry, and it is one XQuery which does the entire rendering to display the translation of the Atom feed into a web page.
Randal Schwartz Wow!
Wolfgang Meier And you can also certainly directly click on the feed link and get the live Atom feed back.
Overview of XML databases vs. LAMP stack (25:00)
Randal Schwartz This all sounds expensive, in performance-wise. Am I going to get the same sort of throughput out of a box that I would get if I just had a LAMP-stack on that?
Wolfgang Meier Well, it depends, I mean (laughter, all three), there is certainly a price you have to pay, but I think that is a price you have to pay for the entire technology. In the beginning of XML, people used to believe that you could really replace your good old MySQL or Oracle database with an XML database. That might be true, but I never really believed that it is always useful. I think there are lots of use cases where the good old relational database is still, well, a very good choice, if you have well-structured data which you can easily manage, where you can define a good, fixed schema, which you do not need to change afterwards, then why switch to an XML database? It doesn't make much sense, doesn't help much. But, if your application is going to be more complex, and you need to mix heavily-structured, well-structured data with less regularly-structured data, if you have a mixture of text-like documents with table-like data, then it makes a lot of sense to use an XML database. And, in that case, you certainly have to pay a price for all the processing power XQuery has to offer and for the simplicity of its use. I think that if you do it in an intelligent way, you will always be able to, get enough performance out of it.
eXist as a "NoSQL" database (27:00)
Randal Schwartz So this puts you in the same league, I would guess, as these, new-fangled, document-based systems like CouchDB and Mongo and things like that. Are you exchanging ideas with them now, or are you just saying, well, it took you six years to get to the party, but we're already here? Or where are you with all those guys?
Wolfgang Meier Well, I am very interested in those technologies and I'll really be very certainly watching them and what they do, and I think, there are a few interesting developments we can benefit from. For example, if I look at some of the key-value-based storage backends, then I could also think about using those backends if they are very efficient within eXist. Why not? So I think we are learning from them, and, I am also always interested in incorporating other people's open source code, if possible, so that's an interesting development. But, well, the entire XML thing is a lot about standards, and there is not just one standard: XML's got a lot of other standards around it which have developed throughout the years. I mean, let's say there is Atom for content publishing, there are standards like TEI, the Text Encoding Initiative, which is heavily used within the humanities. There are lots of other XML-based standards, and I think that's the big advantage of the whole XML thing: You have all those standards working together, and in the end it should all be, well, it should all integrate into one framework which is as easy as possible to use for the end-user. That's our role.
Randal Schwartz So it seems like you would be, then, as you are saying, if I understand this, to take this existing XML standards and use appropriate XSLT and XQuery against those incoming and outgoing documents to, and, like, this database is closer to all that, being an XML native database. So for those applications where you're going to interface with some existing standards, that makes sense to you, something like this, whereas, you know, the other key-value-pair databases like Mongo and things like that, they're going to be a further distance in mapping?
Wolfgang Meier Yeah, sure, I think that key-value-pair databases, for example, are extremely efficient, if you have, well, limited requirements where you just need an extremely fast data throughput. But if you have a scenario with a dataset which has been growing over the years and which incorporates highly-structured text documents and combines it with other data and things like that, then we are certainly better off looking into XML standards and all the tools that help you process those standards.
Randal Schwartz Now, you are calling this a database, is this something that can take a portion of an already existing XML document, replace it with something new, or are you only doing functional applications, turning an entire tree into another tree and then replacing the whole thing?
Wolfgang Meier No-no, it can update parts of the document tree.
Randal Schwartz So it's essentially it's all of our typical database applications, like the equivalent of updating a row in a table, rather than just regenerating the table then.
Wolfgang Meier Yeah, sure, you would update a single element within that document, not the entire document, you don't need to do that. I think from, concerning its background, the storage backend, it pretty much works like any other database, be it relational or not. I mean, you have cache, you have, at least internally you have transactions, things like that, so the technology in the background is not that different from unique B-trees, stuff like that, so the documents are all stored, yeah, the documents are not stored on the file system.
Transaction support (31:35)
Randal Schwartz So the transactions, you say they are only internally, can I have two processes running at the same time and not jumble up the data then, or is there some way for me to say, I am moving money from this account to this account and somebody else is moving some money out of that and not having the numbers get all jumbled?
Wolfgang Meier Yeah, that's possible, but right now our transaction support is still a little bit limited - it needs some more development. Right now, we mostly use transactions internally, so the user cannot define the boundaries of the transaction himself, but it's done automatically. You can certainly, say like, I want to lock that document of that collection now and then process a few queries, a few updates and then unlock it afterwards. But right now there is not a user-defined roll-back. That's a missing feature right now, well, which wasn't too much, so far it wasn't too much a problem for our users, so...
Randal Schwartz Well, yeah, it really depends on the use case, so, I mean, you are obviously not writing, you know, industrial, financial transactions with this yet, in which case you'd need that sort of thing, but if we are doing things like blog and web site publishing, things like that, then that would make sense to not quite need that just yet.
Security, roles (32:08)
How about security, are there roles and things in this, so that I can partition the data based on incoming credentials?
Wolfgang Meier Oh yes, sure, that are various mechanisms, and out of the box we support Unix-like access rights on your documents and collections. I have to say, for the user the database very much looks like a hierarchical directory where I just put my documents into so-called "collections", which are hierarchical, and you can certainly control access to those collections as well as documents, using a permissions system. What is also possible is to define roles and things, using the standard called "XACML", and this also allows you to restrict access to certain function libraries, so you can keep users from posting queries which could damage your system in any way.
Architecture of eXist (34:15)
Randal Schwartz What's the architecture: this is Java, right?
Wolfgang Meier Yeah, that's Java.
Randal Schwartz So all the code is in Java, there are not some pieces in some other languages, just one big "block" of Java?
Wolfgang Meier (Laughter) Yeah, I think pretty much most of it is Java, though certainly many tools around it are also written in XQuery, already.
Hosting and Deployment of eXist (34:48)
Randal Schwartz Oh, okay, cool. So what kind of hosting environment does it require to run, I mean, and are there pieces that can be put together as well, or is it just one big thing?
Wolfgang Meier Well, basically it has been designed as a library which could be used embedded. We have some users which are using eXist embedded in desktop applications, but you could also simply embed it into any servlet container. So, whatever servlet container you have - TomCat, Jetty - it can just be integrated into your web application or you could certainly also run it stand-alone and access through the network.
Randal Schwartz And so you are editing the source code of this with other people, using whatever is best for Java in terms of script development?
eXist-db Project Development platform (35:44)
Wolfgang Meier Yeah, we are pretty much using Subversion, well, as IDE, we are a little bit divided, we have some Eclipse users, some are using IntelliJ, well, I'm sometimes using vi or emacs, whatever.
Sites running eXist: U.S. State Department Office of the Historian website (36:08)
Randal Schwartz (Laughter) So, where might I see an example of, other than the eXist database site itself, are there other sites running this that I might be familiar with?
Wolfgang Meier Yeah, there are a few live sites, for example, the U.S. State Department has a pretty nice site online which is all written in XQuery, and I think it is one of the most well-designed web sites for, well, viewing historical documents, you know they have, well, the State Department publishes those historical volumes, I think one a year, and they have now started to present this online and, so if you go to history.state.gov, you will see a web site which is entirely powered by XQuery scripts and eXist in the background.
Randal Schwartz Wow, and so: when did they decide to use that and what were they comparing it with, what else is competing in this space?
Competitor products (37:20)
Wolfgang Meier Oh, there are some pretty large commercial companies competing with us, so to say, first of all, nearly all the major SQL database vendors do also offer XML database expansions, extensions, whatever you call it, so it is like, well, it's more like you can define one part of your table to contain XML and then you can also use XQuery to query that, so that's, Oracle has it, IBM has it, Microsoft SQL Server has it, so those are the mixed approaches. And there are also commercial vendors of native XML databases, I think Mark Logic is right now developing into the biggest one, yeah.
How did eXist win over the State Department, advantages of eXist (38:25)
Randal Schwartz So how did you win over those, how did the State Department pick you instead of one of those? You have any inside into their decision process?
Wolfgang Meier (laughter) Well, I know those who developed the web site, but I am not into the decision process, I didn't try to sell eXist to them, but I think one advantage eXist has is, well, I think it is pretty straightforward to install, to get it up and running is not that difficult. We also always try to provide an integrated environment where you can immediately start hacking away your code and create you own stuff. And, as far as I know, for example, the main man behind that historian web site did not have much experience in any higher programming languages before. He just started with XML documents and then somehow had to find his way through XQuery, and I think he has been pretty successful with that.
Leo Laporte This is really neat. Was the data somehow in XML beforehand or did he have to perhaps create an XML backend for it?
Standards, XML, TEI (41:00)
Wolfgang Meier No, for documents like this, there is basically one standard, it's the TEI, the Text Encoding Initiative.
Leo Laporte So, I see. So you could write a translator that would translate it into well-formed XML and then put it...
Wolfgang Meier Yeah, it is an XML, it is a TEI XML schema, mainly used by, well historians or other people in the humanities, so it is a full-blown, large, very complex standard. I think that nowadays, if you want to publish anything in the humanities, you have to know TEI.
Leo Laporte That's really great. So it seems likely that this kind of structured data will - there'll be more of it all the time and you provide a great front-end for it.
Wolfgang Meier Yeah, sure, I think it's still, I think XML is still developing, despite all criticism about it being too complex and all those angle-brackets and things. I think, well, people more and more recognise that standards are important and there is no way around XML if you are talking about standardisation: If you really want to take those documents and give them away to other people and exchange them with other projects, you have to use XML and you have to use the standards on which people have been working for years. I mean, TEI has been around for, I don't know, I think it has been around before XML even, so it has really been thought out.
eXist's use of Lucene (41:45)
Leo Laporte I notice you use my friend, Doug Cutting's Lucene engine for the text searching. That's something new?
Wolfgang Meier Yeah, that's something new, we had it in the 1.4 release. Well, the problem was to integrate Lucene into XQuery. So, well, in XQuery you don't want to query the entire document, you want to query certain parts of that document, so you want to say, "find me all paragraphs or chapters having a title containing - whatever." So the new thing is a tight integration between Lucene and the XQuery engine, so you can really use Lucene on the node level: Define indexes on single nodes, have Lucene scoring apply to your queries, and stuff like that.
Leo Laporte I killed you with the Lucene question there, sorry about that, Randal!
XQuery as a Server Language (42:50)
Randal Schwartz I do appreciate the fact though that, you know, I hadn't seen much of XQL until I was looking at the examples on your web site earlier today for the database. I must say that now that I've looked at it, I am happy that actually they've found, created a language that isn't entirely driven by angle-brackets! I mean, one of the things I really hate about XSLT is the demand to make XSLT's controller look like XML data drove me nuts, so XQL I can finally probably start working with! (laughter)
Applications Beyond the Humanities (43:20)
Wolfgang Meier Yeah, I also think it is important to point out that, well, though we have a lot of users in the humanities, there are also completely different applications powered by the same technology, for example, I have been working for a long time with a company which has a, well, a business, how do you call it, ERP Tool, and they model, basically they customise databases for their customers and they're using eXist, so to say, as a backend which contains the model from which they then generate the Oracle database, so, that's one use for example, or, well, other people use it, well, to collect statistics from production machines, things like that, so there is a whole wide range of different uses. It is not at all limited to, well, just only complex text documents, but there are lots of other possible applications for XQuery and XML dbs and stuff.
Randal's Overall Judgment About eXist (44:45)
Leo Laporte Wow!
Randal Schwartz Well, this sound like a really interesting project. If I had a lot more XML data to deal with, I'd be jumping into this right away. Unfortunately, it's not what my current client demands that I do, so I can't really spend a lot of time with this, but it sounds like you have a really interesting stack of tools here: All the way from being able to store XML documents and query them efficiently, all the way out to generating whatever documents needed to do that, and providing the array of interfaces, providing SOAP and REST and all sorts of things means that this is really, can be a glue component inside a much larger system. And, frankly, that's very fascinating. And the fact that you've been working on it for a while shows, because you've got a lot of stuff going on and I know you're head of a pretty wide team of people that are working on this. So I think this is definitely something, if any of our listener's are looking at doing that sort of thing, this is definitely something that I would encourage you to take a look at deeper. So - you have any last words before we wrap up here today?
Wolfgang's Call for Participation (45:40)
Wolfgang Meier No, I would certainly also like to encourage people to not only, well, work with eXist and try it out and everything, but we're also always happy about getting new contributors, be it in coding if you're a good Java coder, you're welcome, but also, if you are a documentation writer or if you are good at XML-XQuery, there's always lots of stuff to work on, so... Our team is, yeah well, very active and we try to also, well, work close together and have a good time.
Thanks & Close of Interview with Wolfgang (46:35)
Randal Schwartz Well, it's really cool. Thank you for being - that you for staying up late, because I know it's late in your time, it's midday in our time, so I know it's really late there in Germany. I appreciate that, well, it's programmer time, what am I talking about, there we go!
Leo Laporte It's when he is most productive!
Randal Schwartz Thank you, thank you very much for this, I appreciate you coming on the show and talking about it.
Wolfgang Meier Thank you very much as well!
Leo Laporte Thank you, Wolfgang!
Post-Interview Discussion About Significance of eXist, XQuery, and history.state.gov (46:59)
Leo Laporte Well, I'm glad you took the lead on that, because I think I know what XML is, but after that it was, like, all down-hill for me!
Randal Schwartz No, I understand a little bit of it. It's what's neat is when you go to something like the Office of the Historian, the State Department site that he was talking about, what a slick site that is, and it's all database-driven, and, as he said, this guy wasn't really, you know, an expert of database programming or even PHP, but he was able to create a site that's extremely slick and desirable. If people wanna take a look at it, it's history.state.gov, and there's a ton of documents there. I found this document from the Lyndon B. Johnson administration, it's titled "The Indian Food Crisis" - I don't think he's talking about chicken masala, but, anyway, you can read all about it here, and there's documents and there's links, and this is an example of a case of where you have a ton of information, and you need a way to easily make it accessible, and this made a great site, it's very navigable.
Randal Schwartz And it need to be repackaged in a way, I mean, because the thing about an XML document that marked-up is that's there's a lot of, probably, meta-information that would just be - "blah" - onto the screen. You wanna be able to take that and index it based on that, but you don't need to necessarily present that, if somebody's just trying to read the body of the document. So to have processors in the middle that understand the native XML format of the document and can take it and present it as natural HTML on a browser with all the navigation stuff around the outside, that's exactly what XSLT and now XQL were designed to do. And I really like it, because he's got a complete open source stack that can be the retail web publishing part of it as well, which is just, that's incredible. Like there's some APIs there too that can let me go query those documents from a RESTful interface somewhere on the side of that, that's all you can see from the front page, but that's very cool, very-very great use of technology, exactly the kinda stuff that we're looking for with open source.
Leo Laporte REST, WebDAV, SOAP, XML-RPC, ATOM, I mean, it's... fully buzzword-compliant.
Randal Schwartz Exactly, all the TLA's you can have in one session, exactly. Well, you know when his home page starts referring to w3.org pages every other link, it's going to be a little thick there for a bit. My head spun a little. I tell you, I'm a decent programmer and my head was spinning a little bit as I was staring at this stuff, so - but, you know, that's not to scare anybody away, please.
Leo Laporte But don't most people, just out of curiosity, but most people, nobody really deals with XML directly, there're gonna use a library to do this, right?
Randal Schwartz Well, yeah, and there's a number of libraries that are out there, I mean, there's a whole stack of stuff for there for every language out there - Java, Perl, PHP - that all deal with this stuff, and, but the great thing is is that, you know, the maturity of the stuff for XSLT is pretty much established, but XQL is still coming of it's own, and to have something that already has this in a way that everything all links together nicely, and, like I was seeing on one of their pages, he was talking about how, you know, I can have a query coming from another box that squirts the entire XQL query into the system, and if I squirt the same query a second time, it remembers what it did the first time, it caches that automatically for me, so I don't have to have cache logic on my side to figure out whether that's the same query or not and whether the data may have been updated or not. So there're a lotta optimisations in there already, I mean, it seems like a fairly mature product, so I'm really looking forward to seeing what comes outta this, like, if you're listening at home and you start adopting this, drop me an email on, you know, how this worked out for you, for this, because this sounds like a really fascinating sorta thing.