Reverse Engineering Logicworks' Accounting Software - Part II

This article originally appeared on Logicworks' old blog.
Logicworks' new blog can be found at www.gatheringclouds.com.

In my previous post I discussed a puzzle I had discovered in a reverse engineering project here at Logicworks. The goal of the project was to keep our closed source accounting system's invoices and purchase orders synchronized with our ticketing, inventory, and CMS system, LogicOps. The specific puzzle was the lack of a primary key in the database tables I was querying and my tentative solution was to identify a set of columns that when taken as a group (concatenated, really) could be guaranteed to be unique. After some analysis with quick Perl scripts, I had identified a set of four columns that I could use to create this composite key, the account ID, the invoice ID, a line-item sequence ID, and a component sequence ID.

Here's an example of the rows for an account, with a quantity and product name column added for clarity:

account_idinvoice_idline_item_sequence_idcomponent_sequence_idquantityproduct_name
012015R272462129920225.0Backup Committment
012015R2724622118401.0BW committment for server (1mb/s)
012015R2724624576001.0Dedicated Xeon SCSI server
012015R27246245760491521.0Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI
012015R27246245760655361.0Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core
012015R27246245760819202.02GB DDRII 667 FBDIMM
012015R27246245760983042.073GB SCSI 10K RPM hard drive
012015R272462457601146882.0300GB SCSI 10K RPM hard drive
012015R272462457601310721.0Microsoft Windows 2008 Enterprise
012015R2724626214401.0Shared loadbalancing service
012015R2724627852801.0Shared firewall service
012015R2724631129601.0Dedicated Xeon SCSI server
012015R27246311296491521.0Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI
012015R27246311296655361.0Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core
012015R27246311296819202.02GB DDRII 667 FBDIMM
012015R27246311296983042.073GB SCSI 10K RPM hard drive
012015R272463112961146882.0300GB SCSI 10K RPM hard drive
012015R272463112961310721.0Microsoft Windows 2008 Enterprise
012015R2724632768001.0Shared loadbalancing service
012015R2724634406401.0Shared firewall service


The account_id and invoice_id columns are obvious enough. Some querying confirmed that these are foreign keys to other tables in the database, even through they were not explicitly defined as such in the database schema. But what are the line_item_sequence_id and component_sequence_id columns? This is one of the perils of reverse engineering. You may find a solution, but you may never understand why something is the way it is. Ideally you'll be able to take the time to put to rest any mysteries that remain in your solution. For this puzzle, it helped to understand these columns' purpose by taking a look at how this customer's invoice looked:

QTYProduct Description
225Backup Committment
1BW committment for server (1mb/s)
1Dedicated Xeon SCSI server
1Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI
1Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core
22GB DDRII 667 FBDIMM
273GB SCSI 10K RPM hard drive
2300GB SCSI 10K RPM hard drive
1Microsoft Windows 2008 Enterprise
1Shared loadbalancing service
1Shared firewall service
1Dedicated Xeon SCSI server
1Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI
1Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core
22GB DDRII 667 FBDIMM
273GB SCSI 10K RPM hard drive
2300GB SCSI 10K RPM hard drive
1Microsoft Windows 2008 Enterprise
1Shared loadbalancing service
1Shared firewall service


Notice how these invoice line-items are grouped? Compare the set of components under one of the chassis' to the corresponding database rows. It appears that grouping of line-items together results in them having the same line_item_sequence_id. Furthermore, the "parent" line item in a group has a component_sequence_id of zero, such as the "Dedicated Xeon SCSI server" rows. Finally, it appears that the component_sequence_id and line_item_sequence_id together provide an ordering for the invoice with component_sequence_id ordering the higher level line items and line_item_sequence_id ordering the nested line items. This mystery is unraveling! Those two columns are used for grouping and ordering the line items within an invoice.

So it appears that the developers of this system did some mixing of presentation and the data model in their design. But that's understandable here since the layout of the line items in an invoice can be considered an essential characteristic of the invoice. The big remaining mystery is the actual values for line_item_sequence_id and component_sequence_id. Why are they so huge? Why not just use 0, 1, 2, 3... and so on? The values almost look random, but to a programmers' eye some of the values are quite conspicuous. At this point I had a solution that I felt I could move forward with, but I didn't feel entirely satisfied that I really understood it. The mystery of the strange values for those two columns could be innocuous, or it could be a ticking timebomb that I'll only discover after weeks of erroneous synchronization fouling up the LogicOps database. I felt like I needed to understand those numbers, so I sat back and started contemplating what might've been the motivations of the developers of this software. In the next post I'll explain what I discovered.