Driving Business Processes in Ruby
by Kenneth Kalmer on July 6, 2009
Decisions decisions, as Ruby developers we face them every day. Some are easier to make than others, they have the ability to shape or break a project (even a business). We’ve faced the decisions of ActiveRecord or DataMapper, jQuery or Prototype, Merb or Rails. Now the decisions are getting tough, and the impact of the next decisions are even bigger. I recently tackled the issue of choosing between AMQP and XMPP (and I’m honestly still on the bench, even when I’m using both in production), and there is the SQL or NoSQL decision. So what is your next big decision?
Do you externalize your business processes? Or, do you use a state machine or workflow engine?
John Mettraux posted a powerful read on the fact that state machines are not workflow engines, and I share the sentiment completely. I’m a vocal user of John’s ruote, a Ruby workflow engine, but I also use state machines, and I use them in tandem.
I choose early on that I need to externalize my business processes for our ISP in a Box product, to help break the project into manageable components, and this article shares how and why.
Limitations of state machines
I’m not giving any background on what state machines are, or any of the possible uses. I’m going to highlight some limitations of state machines from a business process perspective, especially state machines that hook into ActiveRecord (or database models).
Lack of context
Knowing in what state the object is isn’t always enough. For decision makers (autonomous or human), context plays a definitive part in making a business decision. If a service ends up in a “suspended” state, for arguments sake, decision makers need to know why. Was the service suspended due to bad payment, was it suspended due to abuse of the terms of use, or was it suspended because client relations had a bad day. Having multiple states to disambiguate the precariousness of the suspension is not cool.
Lack of history
Extending on the context topic above, the history of how an object got to a state might be just as important to a decision maker. It is also important for a reporter, or observer in the process. Business processes inherently have multiple paths, it is just never as simple as it seems. Placing an order for a book at Amazon, in theory, fires off business processes unimaginable to us mere mortals. Think about it for a second, they need to check stock, decide from which warehouse to dispense stock, optimum shipping combinations, manage re-orders, handle invalid shipping destinations, and so much more. Calling events on an order object just won’t cut it. Each decision maker in such a huge process needs to know why the work ended up at them. A shipping coordinator in New York might question why he has to ship a book to Seattle, but if he knows that Los Angeles is out of stock he doesn’t mind to do it. I know they’re fully automated, but play along for a second.
Open ended
Another thing about state machines is their open ended nature. For a lot of applications this is brilliant, but not for business processes. Business processes have a definite start and end, with whatever means needed to get from the one to the other. State machines can change from any state to another. This can however be controlled with guards, or setting up a limited transition table, but then the machine gets clunky and difficult to manage. It looses it’s appeal.
Versionless
State machines are versionless by nature, and business processes are constantly evolving and getting refactored. As decision makers gain more experience, business processes get leaner and meaner. Especially in times of financial uncertainty there is a big focus on driving down costs and cutting out unneeded, costly steps from business processes. In a state machine this doesn’t necessarily translate into less states, or less transitions, but it does make it difficult to determine if a process is still part of the old regime, or the new one.
State machines are good at expressing behaviors
And this should be leveraged by the developer, however, behaviors hardly come close to defining a business process as a whole. The behavior of an object can be seen as how it it behaves during a business process, and not the process itself. More on this later when I discuss combining state machines and workflow engines.
Don’t abandon state machines!
To proponents of state machines this post will definitely seem like heresy, and I do apologise. State machines are wonderful tools, and at the end of the article you’ll see how I do use them together with a workflow engine. My argument for the post is that state machines are not the right tool to drive business processes. Business processes are big and bulky and far more complex than what they appear on the surface. They need a suitable environment to run in, something stable to drive them, and a database model with a state machine is hardly the place.
Externalize your business processes
Externalizing your business processes, and running them inside a workflow engine is a big decision that has massive payoffs if you implement it correctly. If you do it wrong, you will most likely hate me for writing this article, and pretty much everyone in the BPM/WFE space. For me, the decision to externalize the process is busy paying off handsomely, albeit I’m still resolving a lot of unanticipated issues (ie scaling the workflows).
Enter ruote
ruote is a Ruby workflow engine, which parses process definitions into expression trees and executes them. Greek? Don’t worry. Process definitions, a ruote DSEL, are Ruby classes that definite the flow of your business process. They are made up off expressions (think methods), and each expression plays its part in the business process. This is akin to a flowchart, where you have a start and an end, or multiple ends to the process. Expressions can be decision points (diamonds), participants that perform work (blocks), looping constructs and plenty of other goodies. ruote is a business process operating system…
Workflow engines are business process operating systems
Bold statement, yes, but not a lie. Workflow engines need to be solid and reliable environments. We rely on Apache to reliably house our running application, or databases to reliably store our information, and the operating system to reliably run all of this. If the environment is stable, we can focus on our own code, without much care. The same holds true for our business processes. A workflow engine doesn’t have office hours, needs to be available at all times and running flawlessly. It also requires a lot of instruments that coordinate our efforts, to keep our business afloat. There is persistence, thread safety and concurrent access of work, schedulers and much more at play.
A part of me thinks that state machines are so attractive to us as Ruby developers because we don’t want to manage all the real work involved in running business processes. We like turning ORM’s into Thor’s hammer, so to speak.
Participation
As your business process is executing it will interact with various participants. Participants can be be fully autonomous, or people. The concepts of participants is foreign at first, because it doesn’t map to users of the system directly. To give you a rough idea of participants, consider a simple domain registration process. Our clients register a new domain in the system, which initiates a new instance of a defined business process. First we add the domain to our DNS servers, then we register the domain with a registrar, then we notify the client. Without automation we’ll have one participant: ActiveRecord participant. This participant simply saves workitems in a table in a database, and we can poll this table looking for work. Using the workitem’s payload we can determine whether the workitem indicates one of three unique pieces of work that needs to be performed:
- DNS Admin to add domain to servers
- DNS Admin to register domain with registrar
- Client message when they log back in again (or view domain, or whatever)
With automation the participation changes to something like this:
| PowerDNS Participant | Use ActiveResource to add the domain via PowerDNS on Rails |
| Registrar Participant | Register the domain via registrars pathetic XML API |
| Notification Participant | Send an email to the client to tell them registration has completed |
| ActiveRecord Participant | If automation fails, have a workitem ready for support personnel |
History and context
Using the above example of automating processes, history and context becomes quite important when a workitem lands at the support personnel desk. They need to know where the error happened, what lead up to the error, who is involved in the process, and maybe something more. The ability to extract a JSON-formatted expression tree from the engine allows us use ruote-fluo to graph the process on a canvas tag for the support personnel, so they know exactly what happened prior to the error being reported. This knowledge of prior actions is invaluable when deciding on a course of action, which in business is the difference between closing or cancelling a deal.
Time sensitive & long running processes
Automated business processes can usually finish within a couple of minutes, or seconds, depending on their complexity and the number of autonomous participants involved. But business is never as simple as it seems, and time is usually the enemy. A business process might indicate that if a support call has been opened without an initial response for more than 4 hours it needs to be escalated. Others might have indicate that we don’t send SMS messages between 19:00 and 6:00, so we don’t wake our clients. This mixed bag of sleep and timeout expressions are difficult to get done on your own, and a workflow engine should (ruote does) support this out of the box.
Versioned processes
Process definitions in ruote are parsed upon launching the process. Ruote converts the process definition into an expression tree and works with the expression tree from that point forward. Process definitions can then be fine-tuned and altered and all new launches will use the updated definitions. Existing process instances stick to their expression tree in the engine, so they remain unaffected to the changes and will play out exactly as intended. But the expression trees are not concrete either, once in the engine they can be altered at runtime, but I still have to explore those abilities of ruote myself.
Going hybrid, using the best of both
I said earlier that I haven’t abandoned state machines, and that I rather use the two instruments together to accomplish what my system requires. This is in fact very simple to implement and works brilliantly.
ISP in a Box sells services, and these services need to acquired/provisioned for our clients to make use of them. All our models share, a mostly common, state machine. When services are provisioned for the first time, they enter their initial state called “pending”, and the provisioning process for said service is launched. The state machine has no idea what is going on at this stage, but ruote runs and happily executes the process. Each process is usually a mix of mostly autonomous participants and then some human intervention if the automation breaks. Other processes are purely human (like web design).
While the service is “pending”, now changes are allowed to the service. Instead, when the process completes in ruote there is a callback, a “webhook”, that calls back to Rails and activates the service. This “activation” is an event defined in the state machine and transitions from “pending” to “active”. Once a service is active (ie provisioned) the owner of the service is presented with a lot of new options. These options include changing passwords, upgrading & downgrading and even cancelling. Whenever a change is requested by the client we need to fire off another process, and while the process runs we place the service in an “integrating” state. While a service is “integrating” no changes are allowed, and when the process is done it “activates” the service again.
There are plenty of other states and transitions defined, including states for “suspended” and “deleted” (we don’t remove data from the database). Each event fires off a different process in ruote, leaving the service with only one decision to make: Which process to I launch under these circumstances?
This combination works extremely well for us since the state machines are very small and lean, and they do exactly what is expected of them: indicate state and handle/prevent transitions between said states. They are not burdened with making business decisions, the handle state. Simple, powerful, effective.
Wow! How do I get going?
Glad you like it, but be careful. As wonderful as the BPM/WFE world sounds, it is a serious leap to take. First off all make sure that your situation necessitates a workflow engine. I’ve pointed a lot of people to a simpler daemon-kit + AMQP combination for their needs when asking about ruote. In the majority of simple automated tasks a state machine with worker processes will suffice. ruote is all about business processes and automating those processes.
Implementing ruote also brings a lot of changes to the way you develop software. For one, ruote cannot run inside Rails like a normal plugin. It needs one instance to be running, not multiple ones, due to the internal scheduler. We’re trying to work out away around this, but it’s on the back burners for now. This means you’ll be deploying two applications side by side. You’ll probably also start moving some of your code into gems, so you can share it between ruote and your Rails project.
You’ll also need to implement a communication channel between your Rails instances and ruote, or use an existing engine “housing” like ruote-rest, or give me time to finish ruote-kit. Both ruote-rest and ruote-kit offers a RESTful interface to the engine, allowing all your projects in any language to leverage the power of ruote.
State machine versus workflow aside, a major downside to an externalized workflow is the distribution of architecture. The golden rule for distribution is “don’t”, but that’s unrealistic, because large, complex systems often require it, and it pays off in the form of serious decoupling in these cases. So if a workflow engine will be the first distributed step you make then you should think very carefully about what that will cost you. Not to say don’t do it, but it’s not going to fit everywhere, sometimes the “process” is simple enough not to warrant a workflow engine this specialized. There’s no silver bullet. – Nic Young, a recent reviewer of ruote.
But sometimes the distribution already has occurred because of a service oriented architecture. It is natural to wrap participants around services, or to use participants as facades to services. SOA is usually thought of as “in the bank”, but in our case the applications are already playing with services outside of the building. We had no choice, the distribution was already here. Ruote doesn’t know much about distribution, it just dispatches to participants and sometimes receives workitems back, whether it happend in process, or on the other side of the world.
Another thing potential pitfall is database access. Personally I use ActiveResource to suck data into my participants on a “need to know” basis instead of giving ruote database access and duplicating my models between two projects.
The biggest pitfall of all is learning to write these processes in a concise fashion. The engine and expression language is very very powerful, and you don’t realise at first how to chain things together. Even now, almost a year later, I find myself refining my process definitions to become leaner and cleaner and easier to maintain. The second biggest pitfall, testing business processes is a pain in the ass. I’m definitely making some progress on conceptualizing a testing methodology for process definitions, but it is still some way off.
Is it worth all this trouble?
Yes! As far as I know we might have the biggest implementation of ruote at the moment. I have roughly 70 process definitions in production, and about 20 different autonomous participants. As for human participants, it’s potentially over a thousand, though in ruote terms it is only one, and I map the payload of the workitem back to the user in our systems.
The next best thing that happened to us while implementing business process definitions was it gave us a chance to refactor the business as whole. We’re a small outfit, less than 20 people, so the excercise went well. In a large organization them impact of business process refactoring, or even just formalizing, is a massive undertaking and increases the number of stakeholders in the project dramatically, use more time, and translates into a higher up front cost.
Looking forward
John is working hard on ruote 2.0, which is a complete rewrite of the 0.9 code base. From the outside it will pretty much look the same, but internally it is a whole new beast. I’m taking on a complete rewrite of ruote-rest rest, christened ruote-kit, which will become the prefered means of exposing ruote to Rails applications. After a lot of chatting with Andrew Timberlake and others in #ruote and on the mailing list, it’s become a real neccesity that we provide a framework for testing/specing process definitions. I’ll tackle this once ruote-kit has made it from vaporware into a polished product, and I’m thinking ruote-spec will be its name.
Resources
Cleaning up your act, with a little handsoap
by Kenneth Kalmer on June 12, 2009
Handsoap is a new, fresh Ruby library for creating SOAP clients. Why am I excited about soap? I’m not, unless they’re beautiful. Yet, somewhere along the line you are going to be faced with writing a client for a SOAP service. Whether it is from your suppliers (4 different crap API’s in my case), or just for mashing up data from a rest-less source.
Ruby has very little to offer in terms of SOAP, afaik it only had soap4r until handsoap came along…
BIG DISCLAIMER: This is not a flame war against soap4r, it’s against SOAP.
The authors and contributors have spent a lot of time and effort over the years to build soap4r up to where it is currently. It is a remarkable piece of work, but the cracks are showing. I’ve spent countless hours digging through the code trying to figure out why my mappings and registeries aren’t working as expected. You find code working around bugs in Ruby 1.8.2 and 1.8.4, and we’re all trying to move to 1.9. That is a lot of skeletons to hide…
It’s not all doom for soap4r, I’ve used it to play around with some public API’s for weather and currency data, and if the API is well defined and leverages SOAP as a protocol, soap4r really rocks. The results of wsdl2ruby is bit tricky to navigate, but mostly you can leave that alone and focus on your code.
But, and there always has to be a but…
A lot of API’s are written by people who have no idea what it is like to be on the consuming side of their mess. This is where soap4r falls flat on its face. Take the case of Postini and the postini gem for instance. The gem was my first gem ever and as far as I’m considered it was my worst code ever (open source at least). It was an absolute mess, barely worked more than what our systems required, and was not expandable in any way. The root of the problem was not soap4r, but the worst combination of WSDL and PDF docs to land on any developers screen. The documentation and the WSDL don’t match, not by a long shot, and you have to debug minor issues with wiredumps. I’ll stop my rant now.
Enter handsoap
As development of our premier product comes closer to launch, our email systems came under scrutiny and I needed to update the way we interact with Postini through our gem. I couldn’t, I could barely figure out what I did in the first place. If there was any proper use of the gem outside our company, I would have probably been summoned to criminal court for the injustice I had done in the first release.
So I started over, and I learned about handsoap a couple of weeks ago already.
This was the result in 24 hours:
$ git log --stat 2f91d4d6eaa8bd76c188f20929a19e456d1bb52e..HEAD | grep changed2 files changed, 15 insertions(+), 4 deletions(-) 1 files changed, 38 insertions(+), 0 deletions(-) 2 files changed, 282 insertions(+), 1 deletions(-) 1 files changed, 1 insertions(+), 1 deletions(-) 12 files changed, 9 insertions(+), 682 deletions(-) 1 files changed, 1 insertions(+), 0 deletions(-) 8 files changed, 184 insertions(+), 3883 deletions(-) 6 files changed, 154 insertions(+), 18 deletions(-)
That is over 4000 lines of code, gone! Talk about beauty soap…
At the moment the library is a super thin layer on top of the raw API, but it works surprisingly well and easier to comprehend.
Handsoap’s beauty comes that it leaves you to do all the parsing and prepping of the SOAP requests/responses. At first this sounds daunting, but as you start using it the benefits become very clear and you are in full control of your client. The author published a video tutorial which shows how to build a weather server client in Rails. It is a very fast paced video, but download a copy and step through it piece by piece.
Handsoap does an excellent job at bringing together curb as the HTTP client, and Nokogiri for the XML parsing. It delivers, blazingly fast.
Code Teaser
The EndpointResolverService in the postini gem is a nice example of a single method service that is used to return another endpoint for subsequent API calls.
At the very least you need a skeleton class that looks something like this:
EXAMPLE_SERVICE_ENDPOINT = {
:uri => 'http://example.org/ws/service',
:version => 2
}
class SomeService < Handsoap::Service
endpoint EXAMPLE_SERVICE_ENDPOINT
on_create_document do |doc|
doc.alias 'end', 'https://api-meta.postini.com/api2/endpointresolver'
end
end
After that, you’re pretty much on your own (which turns out to be great). Troels shows in the video tutorial how to use a clever Java application called soapUI to analyze the WSDL and prepare your request markup and response parsing accordingly. It works like I charm.
I have three other internal soap4r-based gems that work on the worst API’s you’ve seen in your life (can you say nusoap?). One of them even lacks a WSDL, and returns serialized PHP data. They were a nightmare to consume, and I’m looking forward to cleaning them up with handsoap as well.
I’m hoping this blog post serves as a motivator for you to clean up your (SOAP) act too. I’ve got two more posts lined up on handsoap, one about mocking your way through writing a SOAP client, and another on using soapUI’s mock service capabilities.
Until then, please share in the comments your worst Ruby SOAP stories (and go easy on soap4r, it’s not their fault).
daemon-kit Progress Report
by Kenneth Kalmer on June 11, 2009
Since I last announced Capistrano support for daemon-kit, a few other things have happened that is steadily moving the project forward. Here is a synopsis of them all:
No more daemons
Sounds contradictory, but I’ve stripped out the daemons gem as a depedency. Everything the daemons gem offered is now handled inhouse, and works much better. For more information please read the release announcement.
Better command line argument handling
A lot has been done to smooth out command line argument handling, and daemons can now tap into the command line processing very easily. Details in the release announcement and the RDoc.
Error mails now handled by TMail
Makes them look better and be better formed, the initial code didn’t really produce meaningful mails. A lot of work still needs to be done in this arena.
Log with vigor!
Logging got a lot of attention with the new AbstractLogger interface. Log rotation is now dead simple and logging to Syslog is possible thanks to the SysLogLogger gem. See the RDoc for more information on logging.
Cucumber
Still a lot needs to be done to simplify testing daemon processes, but the inclusion of preliminary cucumber support is one step in the right direction. I’m hoping cucumber will be a great fit for daemon-kit, and help us as daemon developers define the difficult contexts in which our code runs.
Looking forward
Plenty still needs to be done, the TODO file in the repo is some insight into what the future holds. Mostly daemon-kit evolves as I need it to for my production environments. On Github we’re close to 200 watchers, and the Google Group does have a little activity.
As time goes on I’m hoping more people get involved and bring suggestions to the table for the framework. So please, do tell me how daemon-kit can make your life simpler for your, and your daemons.
Capistrano for your daemons
by Kenneth Kalmer on May 26, 2009
Nothing like some possitive feedback to keep you looking after your own projects, and the feedback on daemon-kit has been great so far. My TODO list is growing every day with more features that I need, and others need as well. I mostly add features to daemon-kit as I need them right away, for this reason the docs is still lacking and the support site hasn’t been setup yet.
However, I’ve setup the following channels for daemon-kit communication (and updated the README):
- daemon-kit@googlegroups.com
- #daemon-kit on Freenode
I’ll be doing my best to stay on top of these channels and garner more feedback from the community at large.
Today’s sprint was one of the very first things I planned, support for capistrano.
When generating a daemon, use the ‘-d’ flag to specify which deployer to use. Currently only capistrano is supported, but I’ve left the door open for anyone willing to add support for Vlad the Deployer.
$ daemon_kit /path/to/project -d capistrano
This will leave you with a Capfile, which you don’t need to edit, as well as config/deploy.rb and config/deploy/*.rb. The environment specific files in config/deploy is used by the capistrano-ext gem to support multistage deployments for testing your daemons. Shared configuration details go into config/deploy.rb. You need the following gems installed to leverage the deployment options:
- capistrano
- capistrano-ext
Capistrano is historically built around Rails applications, but I lifted the capistrano ‘deploy’ recipe from capistrano-2.5.5 and customized it to fit in with the requirements of daemons. Some steps have been removed, and some added. Here is the tasks available:
$ cap -T cap deploy # Deploys your project. cap deploy:check # Test deployment dependencies. cap deploy:cleanup # Clean up old releases. cap deploy:cold # Deploys and starts a `cold' application. cap deploy:copy_configs # Copies any shared configuration files from :... cap deploy:get_current_version # Get the current revision of the deployed code cap deploy:pending # Displays the commits since your last deploy. cap deploy:pending:diff # Displays the `diff' since your last deploy. cap deploy:restart # Restarts your application. cap deploy:rollback # Rolls back to a previous version and restarts. cap deploy:rollback:code # Rolls back to the previously deployed version. cap deploy:setup # Prepares one or more servers for deployment. cap deploy:start # Start the daemon processes. cap deploy:stop # Stop the daemon processes. cap deploy:symlink # Updates the symlink to the most recently dep... cap deploy:update # Copies your project and updates the symlink. cap deploy:update_code # Copies your project to the remote servers. cap deploy:upload # Copy files to the currently deployed version. cap invoke # Invoke a single command on the remote servers. cap multistage:prepare # Stub out the staging config files. cap production # Set the target stage to `production'. cap shell # Begin an interactive Capistrano session. cap staging # Set the target stage to `staging'.
Very close to the original recipes. I’ve still got to clean this up even further, and add some documentation on how to leverage capistrano for daemons, but for the time being this works very well. The recipe can be found in lib/daemon_kit/deployment/capistrano.rb if you have the urge to review it.
Heading towards 0.2, here are the main highlights:
- god & monit config file generation
- Improved rdoc’s and supplementary documentation files
- Finishing off some issues with error handlers
- Better logging