JBossESB and Wise to implement ETL phase for a big DataWareHouse

As I wrote in some previous posts me and my fine team are working from a while to a project using JBossESB Wise action in a real world enterprise application. We are using it for the ETL (Extract Transfor Load) phase for a big DWH (Data Ware House) with an incremental loading of data.

In a nutshell we trace logical changes on an OLTP database (it’s a financial DB where all changes can be associated logically to a single company or at least to a network of company related for various reasons). Then we use JBossESB (and in particular SQLGateway) to periodically treat modified companies and extracting and enriching information to be loaded on the DWH instance. Where wise have its place? Well a lot of information and business rule to extract or enrich data have been implemented as webservices in last 3/4 years. So it’s pretty natural to reuse them to implement this last application.

Ok, it’s the bird eye view of the problem and the solution. On the rest of the post I’ll go in more formal details, starting with requirement and environment description

Requirement and environment description

The main requirement have been to collect a set of data regarding a large set of company (about 5 million) in a DWH for a marketing analysis. This data comes from different systems: 3 different OLTP relational database, and legacy host based system, an external provider. The good news is that both host system and external provider are accessible using webservices. Moreover OLTP databases have some webservices extracting data applying complex business rules; they doesn’t cover all requirements, but these DBs are completely under control of our development team, and dedicated jdbc and/or EJB3 access could be developed for new goals.

The final users would update it’s DWH with daily frequency. The large amount of data made impossible to extract transform and load the whole data every night. We have decided to keep track of changes on the main OLTP DB, and reload completely companies changed (some thousands a day).

Of course this approach isn’t totally new, incremental ETL are pretty common in DWH world, and all vendors have its own proprietary solution. While these proprietary system have its place and its plus, isn’t IMHO sufficient flexible to support an heterogeneous environment as one described. I thought it’s better to track with proprietary triggers logical significative changes (not a lot in fact) and adopt a SOA solution for ETL. It would be better in terms of flexibility and would permit us to reuse much more easily a lot of already written services containig complex business rules.

So the solution adopted have been based on JBossESB ant its composed by these macro steps:

  1. A set of triggers on 2 of 3 named OLTP DB collect changes and write a unique identifier of the company in a dedicated table
  2. A SQLGateway consume this table (the frequency of wake up and filters of the query are designed to avoid excessive and and not useful double treatment of companies due to double linked changes)
  3. Any company is processed by a set of action chains. This actions could be locally defined actions reading relational database or Wise based web services invocations. A content based router policy route messages from an action chain to the next one.
  4. Finally data extracted and transformed are written on the DWH.

Point 3 is of course the core of the system. The SQLGateway create a message containing a pojo object called Company and any successive action trasform or enrich this object with data collected and business rules applied. Wise’s based action calls webservices and use smooks to transform and enrich input object with ws returned values. Using CBR and continuous enrichment of the same object we get at last action (writeOnDWH) an object with all data needed t be written on the DWH.

Focus on Wise

A lot of actions are simply webservices calls implemented with a zero-code approach using Wise. We had just to write jboss-esb.xml fragment for webservice call and smooks config files to get a lot of business rules reused. It have been really GREAT!

I need to add some patch to current integration in ESB to obtain the max response from wise, but results have been really impressive: we had something like 90K company processed in an hour. What does it mean in finer details? Well from wise point of view about 300K web services calls in an hour!
Well also performance and numbers of ESB have been impressive: we are running on a single Linux64 machine (AMD64 double dual core) with 10 jms-listener processing 10 different chains  (200 concurrent 3ad for any jms-listener) for a total of 1.7M (wise and not) of actions called in an hour.

Isn’t it impressive numbers?

There is a list of patches I applied to wise/esb integration to support my requirement. All the code are committed on my workspace (maeste) in ESB svn:

Feature Request JBESB-2019 wise should pass to smooks response mapper also input data to permit continuos enrichement of message Major
Bug JBESB-2020 wise have a bug for which it may download too many wsdls and store them in a temporary dir Major
Feature Request JBESB-2021 add configurability for location where wise store smooks reports for its transformation Major
Bug JBESB-2022 wise doesn’t clean its internal smooks cache Major
Bug JBESB-2023 Wise is failed to consume a wsdl which contains two schema element with same name and different namespace . Major
Bug JBESB-2036 wise’s sample have problem because targetPackage not specified in properties files Major
Feature Request JBESB-2037 Avoid excessive reflective inspection of wise classes for better performance Major

I can’t go in more detail of the implementation or put here configs files because I cna’t reveal any business details of the application. I’ll try in next future to arrange an example totally equivalent in technology content, but without any link to real business content. If you are interested let me know, but be patients…it’s not a joke and I’m very very busy these days.

Thanks to my team (special thanks to Paolo and Luca)  and all contributors of Wise and ESB to make it possible :)

PS: what about huge split and route qs included in ESB 4.4. Well they cover different problems, even if not far each other. The main difference is that here we haven’t a huge message to split and route, but a lot of little message to enrich and then route (content based) to next enrichment phases.

Wise (and LMS) have a new web site

After a long time we decide to update Wise and LMS web site (www.javalinuxlabs.org)

For some time we had on this site (in page specific for wise) this paragraph:

All informations you can find about wise in this web site are legacy.

This site refer only to wise as web application to call web service, but wise is changing a lot in these days, becoming a general library o implement a generic, zero-code webservice invocation
Also this site will change a lot in few days.

We are working with a new and enthusiast team to evolve wise: have a look to this two post of my blog to understand where are we going and of course STAY TUNED.

http://www.javalinux.it/wordpress/?p=26

http://www.javalinux.it/wordpress/?p=43

Well now it’s time to update that site! ;) It isn’t either finished or perfect, it will be updated a lot in next weeks with a lot of addition as docs, samples, and hopefully a binary realease of wise-core.

Stay tuned

BTW we realize the website with drupal. Nice toy!

JBossESB 4.4 have a new zero-code webservice invoker

We are proud to announce that recently released JBossESB 4.4 contain a wise based implementation of webservice client invoker.

In a nutshell it is a zero-code webservice caller supporting smooks based mapping, and pluggable JAX-WS handler. Here is an abstract of the message with which I presented it to ESB community (here you find original message and related discussion):

It uses wsconsume API to dynamically generate client object and invoke web service, delagating to JBossWS JAX-WS implementation the dirty job.
It use smooks under the hood to transform user defined object into JAX-WS generated ones.

It support also standard JAX-WS handler and a generic smooks transformation handler to apply transformation to generated soap messages.

You can find it in my workspace under product/services/soap/src/main/java/org/jboss/soa/esb/actions/soap/wise/
I also wrote javadoc for the action class explaining how to use it and e example demonstrating 3 common use case:

* Direct call of a simple service without any mapping is needed
* Call of a service using a smooks mapper java-to-java
* Call a simple webservices without mapping, but with an handler
modifying header with smooks and an handler logging on System.out
request and response
In this 3 examples don’t forget to have a look to wise-core.properties for some important configs. Of course they could be integrated in action’s config in jboss-esb.xml in next future, but this first implementation leave them there.

On wise roadmap I have the implementation of webservices’ call receiving different resources (CSV, XML and so on) using smooks to map it on JAX-WS generated client objects, giving another interesting opportunity in ESB environment.

It is an initial implementation, and I need to integrate wise objects generation with new smooks configgenerator ( http://milyn.codehaus.org/Smooks+User+Guide#SmooksUserGuide-GeneratingtheSmooksBindingConfiguration ) to make user experience easier.

Moreover we are working on wise-core to improve it and make it more configurable an pluggable and support much more stuffs. I’ll post a roadmap soon.

Stay tuned!

wise-core in jbossesb first implementation

As said in this post one of possible use of wise-core (the new core we get independent from Wise) is to integrate it in JBossESB to make a generic soap client invoking web service using Smooks transformation to hide final user the gap between their own object models and one generated by JAX-WS tools dynamically.

Well I contribute with some code to JBossESB providing an action which does what I described in a nutshell here. My efforts and possible improvements are described in this post on ESB developer forum.

Give your feed back there.

BTW I’m developing a real world application based on ESB and this wise-action: it takes some date from a db, enrich the message calling a set of webservices using wise, conditioning these calls with a content based routing approach, and then write the databack into db. I’m planning a post about this application as soon as me and my team will finish it….stay tuned!