This entry is about a problem at work, and how I helped to resolve it. It is due soon as a memo to my summer co-op class. This entry is long, but you don't mind, do you?
My team at work recently received a task: to simulate 200 users simultaneously processing thousands of transactions in a certain web-based software program. This must be done as fast as possible. Without going into too much detail, I'll describe the setup and the problem.
The aforementioned 200 users must first input thousands of transactions, receiving from the web-based software a unique ID for every transaction. Then they (either all of them simultaneously or a part of them) switches to the next stage: load a page of unprocessed transactions, select a transaction from the list, and process it. Note the race condition: if two users load the page at about the same time, they may choose to process the same transaction. The first to reach the finish line will succeed, but the other (or multiple others) attempting to process the same transaction will get an error. These simulated users do not communicate between themselves, so those who get the error do not know what happened: they just know that their action resulted in failure. What to do?
The short-term and simple solution was to segment all 200 of these users: each user can only see his own transactions. This, however, does not reflect real-world conditions. In production, multiple user IDs are able to share the list of transactions. Additionally, there's no pride in the segmenting solution.
In order to avoid the race condition which results in an error, the simulated users must process transactions deterministically, instead of choosing the next seemingly available transaction. But if these users do not, by default, communicate with each other nor with any central entity, what to do?
Fortunately, the testing software that my team uses allows loading and using custom DLLs for every simulated user! Typically
DLLs are used for sharing common functionality between programs. In this tesing software, DLLs allow me to utilize functionality that the software was not designed to support explicitly.
I proposed to my manager the concept of a DLL which allows each simulated user to communicate with a central database. The first stage of our testing process would have simulated users input transactions, receive unique IDs, and store them in the database. The second part, where these users process transactions, would query the central database and receive the next available ID. Once the ID is issued to a user, it would not be reissued to anyone else. Thanks to the inherent qualities of a relational database (see
ACID), this race condition is gone, and future race conditions of this nature are not feared.
I thought writing this DLL a pretty daunting project, and it was an innovative idea for the line of work that my team is on, so I made my manager aware that it would take "a long time", meaning weeks. The reasons I banked on it taking so long, in order of increasing significance:
- I've never written a Windows DLL before
- I had no standard corporate compiler, as middle management denied my team's request for the media of Microsoft Visual Studio; I had to improvise
- My DLL is supposed to interface with a relational database through a yet-another DLL; I have never used DLLs in programs before
- Despite choosing C as the best language for this project, I've never written a pure C program beyond printing "Hello world" to the screen
The compiler issue was solved by using
MinGW with MSYS. This offers GCC in a UNIX-like environment (including make and, more importantly, ls!). It was surprisingly painless to set up, foreshadowing future successes. Open-source/Free compilers and UNIX environments are a godsend. Item 2-solved.
Within 10 minutes of setting up a compiler and executing a custom "Hello world" program in C++, I found a great example of compiling Windows DLLs with MinGW, and confirmed success by exporting the arduous and convoluted functionality of displaying the greeting into a separate library. DLLs are great. Item 1-solved.
Then I got down into C and libpq, PostgreSQL's C interface. First, I'd like to discuss libpq. I was expecting to discover idiosyncrasies and non-intuitive gotchas-it is what I am used to after working with software as a user for many years. Yet, this was not the case. Everything worked perfectly. Connecting to the database-instant and reliable. Executing a query and receiving results-flawless. Everything was intuitive and worked just right. PostgreSQL and its C interface are awesome. Item 3-solved.
Now, C. My closest experience with C was when in 9th grade I wrote a Hangman program in C++ with a twist: I chose not to use the AP string (apstring) library and instead to pass character arrays around. My reasoning at the time is that I did not want to bow down to The Man, and wanted to prove to myself and others that I can live without apstring. I succeeded, but recall having an unexplained issue of parts of static text in the greeting being overwritten with random characters. ...Yeah.
Since then I got a bit more knowledgeable about memory, pointers, and was ready to tackle passing character arrays, minus the buffer overflows this time.
I've begun to appreciate the beauty and utility of C. Never have I felt more in control of what the DLL is doing, and how it is shuttling data. I had to patch a few discovered buffer overflows after determining that strncpy() behaves slightly differently than how I expected it to work, but thankfully the code for my DLL is small enough that I am not losing track of neither the big picture nor the details of its every aspect. Therefore I can test every bit of the functionality of the DLL, trace it, and verify that it is doing exactly what it should be doing. The DLL works, and it is superfast. The last item, item 4-solved.
It took me about two workdays to complete this project, from the time I committed to it to the time I demonstrated the 25-kilobyte beauty to my manager and some coworkers. I am grateful to have access to the excellent tools that made this project possible in the short timeframe: GCC (which thankfully supports Windows), PostgreSQL (which thankfully supports Windows), and C.