Talend Open Studio at DOSUG

logo-talend-small
I had a great time speaking at DOSUG last night on Talend Open Studio, the eponymous open-source ETL product by the French start-up. I realize ETL doesn’t exactly capture most developers’ imaginations the way cool dynamic languages or cutting-edge web frameworks might, but I think we had fun. The attendees were engaged and had many good comments and questions. I suspect at least a couple of them know the ETL landscape a lot better than I do, but they seemed happy to know there’s a credible open-source product in the marketplace.

I had two main examples to show. The first was a simple (and contrived) transformation of an OPML file into an Excel spreadsheet and a text file. The Excel file contained a list of the names and URLs of all my feeds in a human-readable format. The text file was supposed to contain a list of the unique link types (hint: they were all “rss”), but that part of the demo actually didn’t work properly due to some fault of mine. This being an occupational hazard in live coding not to be dwelt upon when your audience is patiently waiting, I dropped it and moved on.

After the first demo, I talked about basic data warehousing principles a little bit as outlined in by Kimball and Caserta. This kind of thing is tricky with a diverse audience, because the speaker runs the dual risks of insulting the informed and not informing those new to the subject. Brevity is usually the best policy.

The second demo showed a real-life transactional schema from a start-up I had been involved with a few years back. (The present custodians of the data were kind enough to share a sanitized copy of it with me for this demo.) I showed a few transformations of transactional data of varying levels of complexity into the relevant fact and dimension tables, including some look-ups from external text files and one or two interesting joins on the transactional inputs. Mind you, I didn’t proceed to show any neat analytical tools running on the newly minted warehouse, but the OLAP world is your oyster once the ETL job runs to completion. Relatively speaking.

A Talend job showing the creation of an order line item dimension.

A Talend job showing the creation of an order line item dimension.

I was frank about Talend’s weaknesses. There are a few tutorial screencasts on the web site, but other than that I don’t consider the getting-started documentation to be particularly smooth. The Business Modeler is a confusing addition to the product—a third-rate drawing program that distracts the newcomer and adds no discernable value to the suite. The lack of credible Mac support is as disappointing as it is surprising, given that the tool is entierly Eclipse-based. However, I still see the tool as an option very much worth evaluating if you have needs in the space.

All in all, I’m happy with how the talk went, and I’d like to put the tool to use in a production environment at some point soon. I hope to be able to make a few upgrades to the talk and give it at some other local groups as the opportunity arises. I’ll update with a link to Slideshare as soon as I get the deck upgraded.

Another thing: as , I did wear all black to the talk. And yes, Matthew McCullough did play a Johnny Cash song just before the meeting got started. See his post on the event for another account of how it went.

Leave a Comment