As I wrote in some previous posts me and my fine team are working from a while to a project using JBossESB Wise action in a real world enterprise application. We are using it for the ETL (Extract Transfor Load) phase for a big DWH (Data Ware House) with an incremental loading of data.
In a nutshell we trace logical changes on an OLTP database (it’s a financial DB where all changes can be associated logically to a single company or at least to a network of company related for various reasons). Then we use JBossESB (and in particular SQLGateway) to periodically treat modified companies and extracting and enriching information to be loaded on the DWH instance. Where wise have its place? Well a lot of information and business rule to extract or enrich data have been implemented as webservices in last 3/4 years. So it’s pretty natural to reuse them to implement this last application.
Ok, it’s the bird eye view of the problem and the solution. On the rest of the post I’ll go in more formal details, starting with requirement and environment description
Requirement and environment description
The main requirement have been to collect a set of data regarding a large set of company (about 5 million) in a DWH for a marketing analysis. This data comes from different systems: 3 different OLTP relational database, and legacy host based system, an external provider. The good news is that both host system and external provider are accessible using webservices. Moreover OLTP databases have some webservices extracting data applying complex business rules; they doesn’t cover all requirements, but these DBs are completely under control of our development team, and dedicated jdbc and/or EJB3 access could be developed for new goals.
The final users would update it’s DWH with daily frequency. The large amount of data made impossible to extract transform and load the whole data every night. We have decided to keep track of changes on the main OLTP DB, and reload completely companies changed (some thousands a day).
Of course this approach isn’t totally new, incremental ETL are pretty common in DWH world, and all vendors have its own proprietary solution. While these proprietary system have its place and its plus, isn’t IMHO sufficient flexible to support an heterogeneous environment as one described. I thought it’s better to track with proprietary triggers logical significative changes (not a lot in fact) and adopt a SOA solution for ETL. It would be better in terms of flexibility and would permit us to reuse much more easily a lot of already written services containig complex business rules.
So the solution adopted have been based on JBossESB ant its composed by these macro steps:
- A set of triggers on 2 of 3 named OLTP DB collect changes and write a unique identifier of the company in a dedicated table
- A SQLGateway consume this table (the frequency of wake up and filters of the query are designed to avoid excessive and and not useful double treatment of companies due to double linked changes)
- Any company is processed by a set of action chains. This actions could be locally defined actions reading relational database or Wise based web services invocations. A content based router policy route messages from an action chain to the next one.
- Finally data extracted and transformed are written on the DWH.
Point 3 is of course the core of the system. The SQLGateway create a message containing a pojo object called Company and any successive action trasform or enrich this object with data collected and business rules applied. Wise’s based action calls webservices and use smooks to transform and enrich input object with ws returned values. Using CBR and continuous enrichment of the same object we get at last action (writeOnDWH) an object with all data needed t be written on the DWH.
Focus on Wise
A lot of actions are simply webservices calls implemented with a zero-code approach using Wise. We had just to write jboss-esb.xml fragment for webservice call and smooks config files to get a lot of business rules reused. It have been really GREAT!
I need to add some patch to current integration in ESB to obtain the max response from wise, but results have been really impressive: we had something like 90K company processed in an hour. What does it mean in finer details? Well from wise point of view about 300K web services calls in an hour!
Well also performance and numbers of ESB have been impressive: we are running on a single Linux64 machine (AMD64 double dual core) with 10 jms-listener processing 10 different chains (200 concurrent 3ad for any jms-listener) for a total of 1.7M (wise and not) of actions called in an hour.
Isn’t it impressive numbers?
There is a list of patches I applied to wise/esb integration to support my requirement. All the code are committed on my workspace (maeste) in ESB svn:
I can’t go in more detail of the implementation or put here configs files because I cna’t reveal any business details of the application. I’ll try in next future to arrange an example totally equivalent in technology content, but without any link to real business content. If you are interested let me know, but be patients…it’s not a joke and I’m very very busy these days.
Thanks to my team (special thanks to Paolo and Luca) and all contributors of Wise and ESB to make it possible
PS: what about huge split and route qs included in ESB 4.4. Well they cover different problems, even if not far each other. The main difference is that here we haven’t a huge message to split and route, but a lot of little message to enrich and then route (content based) to next enrichment phases.