Dave Copeland - contract-based testing for event-driven architectures

Nov 2, 2017

Emit is the conference on event-driven, serverless architectures.

Here we have Dave Copeland, Director of Engineering at StitchFix and purveyor of consumer-driven contracts.

StitchFix is an e-commerce company: they send you clothes through the mail, and if you like them you buy them. If you buy the whole box, you get a discount. So the engineering team is maintaining some logic about how to charge customers and how to apply the approapriate discount for the number of items they buy.

The engineering team manages most of the operational side of the business—warehousing, purchasing, printing packing slips—and each team owns and maintains their own software, and all of this software talks to each other. So it gets complicated to update different parts of the system without harming the core business.

Their solution? Consumer-based contracting. Watch Dave's talk below or peruse the transcript for some real-world examples of how this works.

More videos:

The entire playlist of talks is available on our YouTube channel here: Emit Conf 2017

To stay in the loop about Emit Conf, follow us at @emitconf and/or sign up for the Serverless.com newsletter.


Dave: Hi, thanks for having me here. We're gonna talk about Imagining Contract-Based Testing for Event-driven Architectures. Anyone know what consumer-driven contracts are? Contract-based, have you heard of that? Pact maybe? Cool. So, we'll talk about that. But technology, for technology's sake, is pointless and annoying. We have to have an actual problem that we're solving. So what problem does all this pertain to, right? So we build these software systems, right? And unless we're building a tightly integrated monolithic system, we have lots of different bits of software, particularly, the subject matter of this conference. And those pieces of software don't do anyone business thing by themselves, they collaborate together to do a business thing, a business process. And we wanna know if that works, and we wanna know that as we make changes to these little bits of software like, "Does the business process they're supporting still work? Will the change I'm introducing break things?" And we want to know that without clicking around web browsers and doing manual things. We want to know that, you know, in a more automated fashion.

So, as mentioned, I'm a Director of Engineering at Stitch Fix. I'll give you a little back story of us because you'll need that to follow along with my example. So we're a personal styling service for clothes. And at the core of our company is an e-Commerce business, right? We're shipping things that we purchased at wholesale and we're selling them for retail and all that. The difference is that our customers don't get to choose what they get, we choose for them. So we have an algorithm that chooses what clothes we think they're going to like based on information they've given us. The human stylist will look at the output of that algorithm and decide what they're going to get. And then in our warehouse, we find those five items, ship them to the customer, the customer opens the box and sees them for the first time, hopefully, loves them, tries it on, pays for whatever they like, returns whatever they don't like.

And one thing that's worth pointing out is that we, the engineering team at Stitch Fix, build all of the systems that do all of the internal operations. So, I was just talking about a warehouse, so we have software that runs our warehouse, we build that. Software that our buyers use, we have buyers just like Nordstrom's and Macy's would, they have software that we use to manage this whole process. The styling thing that I talked about, we have software for that. So everything is something that we write. And a lot of these systems interact with Synchronous HTTP services. But messaging and events are even more a thing that we do. We have a lot more of that going on and that is a big part of how all of our software works together to implement business processes.

So we're gonna talk about one in particular called the pack slip. This is a picture of it. So this is what you get in the box when you open it up and it's got some information. So there's an order ID, right? So the warehouse knows what it is. There's some items, right? That's what you're getting. And we can see there's some metadata like the descriptions of them and their price. And then we've got this discount... charging a discount logic, right? So if you wanted to buy all five, you get a discount. We want to show that to the customer so they get excited about buying everything in there.

So the process by which this gets created is kind of...there's this two parts. So we'll talk about the first part which is all about Synchronous services. And I'll talk about how we test those using contract-based testing. So we all know what that is. And then we'll see how that applies to messaging because messaging is more involved in this.

Cool. All right. So, we've got the basics here, right? So an order ID comes into our warehouse management system and to generate a pack slip. It will contact the inventory metadata system that has stuff like the item description and the price. It will contact the financial transaction service to get that charging and discount logic, right? Because we don't want the warehouse management system having to know how to do discounts. It'll put all that stuff together and it'll put that into a cache. And then when we wanna print that thing out, the associate is able to grab it out of the cache and print it quickly. So the associate on the floor printing things doesn't have to wait for all these, like, really slow Synchronous Calls.

So we wanna know if this works. So we've already got, like, three pieces of software involved in making this work. And, of course, each piece of software is maintained by a different engineering team. So at Stitch Fix our teams are aligned with different parts of the business. And so you could imagine that this could create a problem, right? Why would the finance engineering team need to know how pack slip printing works, just to make changes to the service that they own and maintain? Why would the merchandise engineering team have to know this whole thing just so they can make changes to the inventory metadata? And maybe if the team is small enough, everybody can understand how things work but as you get bigger, you just really can't. I mean, our team has 100 engineers, which is kind of big but not really that big, and there's no way anyone can understand everything every time they make a change. So how can we make sure that when we change these ancillary systems, we're not breaking the core business process?

So, this just demonstrates more of how our engineering team is organized. We all have our own roadmaps aligned with these parts of the business, and that allows each team to sort of deeply understand that part of the business. And the cost is, again, that they cannot understand the entire business in detail, in whole. So there really is a need for these teams to be able to change the systems that they own, without breaking everything that everybody depends on them for.

So let's focus on the interaction between the warehouse management system and the financial transaction service. So as good engineers, right? We're gonna write a test. I work on the warehouse engineering team. I've written a test of my pack slip printing code, and part of that test is gonna be to assume I'm sending some requests to the financial transaction service. And then it comes back with some sort of payload. I'm gonna assume that payload looks a certain way and I'm gonna feed that into the rest of my test, make sure everything works, right? So we could run that against the actual service, but that, sort of, is very difficult and potentially breaks down at even moderate complexity. So, instead, we run it against a mock version of the service but we capture what happens. So in our tests of the warehouse management system, we're capturing the URL we hit and the payload that we got back, and all these expectations as a contract.

So the financial transaction service, it can have a test and the tests sole purpose is to grab this contract and execute it against itself. Ideally, we could test against the real thing but that's not really feasible, this achieves that. So the contract is what the WMS is expecting to happen in production. And then the financial transaction service can actually execute that contract and see if it actually does what the warehouse management system is expecting it to do. And so if all of that passes, then we can have relative confidence that everything's working in production. And what it does mean is if the financial transaction service makes a change, it can evaluate the potential change against this contract and if it violates the contract, it knows that that change will break the warehouse management system, so we don't go forward with that change.

And so this kind of is nice, right? This is a nice property of having Synchronous services because they're very static, right? This picture here, that you are not expected to be able to read, was derived from our infrastructure description. And it's all the applications and all the Synchronous services that they consume. And I'm able to do this because the application... an application that consumes a Synchronous service, like, it knows that in its code or configuration. Like, it might not know where the service is, but it absolutely knows, "I need to contact something called the financial transaction service." So we can take that and do stuff like what I've described. So we can put a contract with every one of those lines and we can evaluate those contracts every time we make a change, and we can refuse to promote any change that violates these contracts. And we do this and this works as advertised. It's pretty nice and we don't have to stand up like every known service to run a test.

So, what about messaging? Did that makes sense, kind of? Okay. So there's more to this pack slip printing thing then I let on, right? So the merchandise app, where our buyers, you know, do their business, they might decide to mark down the price of an item, and if that item is in a pack slip, we wanna regenerate that so the customer sees the new lower price. We might change the item metadata, like, there's a typo or maybe we've got a better description of an item, so we want the customer, again, to see that update. And then we could do something more drastic, change what items are actually in the order. And so then we want the pack slip to be updated. And so rather than have the pack slip or the warehouse management system, like, know all these details, it just consumes these messages. And whenever it gets one of these it's like, "Oh, something changed that I care about. I'm gonna go see if there's a pack slip that has its item in it. I'm gonna regenerate it and put it in the cache. And now, I'm up to date, everything's good."

So what we've introduced here is three new ways to break this entire process. The problem, though, is with these Synchronous Calls, right over here with the inventory metadata and all that, if that breaks it's pretty loud and obvious. If these messages cause something to not work, it's not obvious, right? You could imagine the price updated event. What if that gets sent to a different topic or a different routing key and it just never gets to the warehouse management system. We'll never know that, we'll never ever know. We'll potentially never know that these things didn't work. What if they're missing data? And so the warehouse management system ignores the message, right? We could potentially never know that this thing is broken. We don't want that. We wanna know that these things work.

Okay. So, how could we apply the contract-based testing that we just talked about to this? All right. So let's talk about the interaction between the styling app and the warehouse management system. Now, they don't really know that they interact but we know that they do. So the developers of the styling application, right? They'll write a test of some feature like of adding an item to an order or taking an item out. And part of that test, regardless of anything else, that test is gonna also say, "Hey, I'm sending a message." Like, "When this happens, a message needs to get sent." So the test will assert that that happens. So we could capture that as an artifact to say, "I'm guaranteeing, under certain circumstances, that a message that looks like this is gonna get sent." Now, the warehouse management system... it has a similar test for when it gets this message, right? And it will hard code some sort of payload, feed that into the test and say, "When I get this payload on this topic or whatever, make sure that the pack slip gets regenerated." Instead of hard-coding the test data, it could just grab this guarantee and use that as input to its test, which... I'm gonna call an expectation. I'm making up all the works here. So, you know, they're not, like, official but... And so the cool thing about this is the styling applications actual output of what it actually does is feeding the test of the warehouse management system. But they don't actually necessarily have to know about that.

So, what are in these guarantees? So there's some sort of schema. You're probably gonna want a schema that says, "The payload is gonna conform to the schema." You can rely on that. You might have guarantees about the metadata. In one of the slides... we use RabbitMQ and that has this concept of routing keys. So that's, like, a part of our messages or whatever it might be. You also need some identifier, right? Because you want these things decoupled. So the warehouse management system wants to say, "I depend on that message," without knowing that it's coming from the styling application, it needs to not have to care about that. Now, the expectation, on the other hand, it has to have the ID of the thing that it's hooked into. "I'm expecting the message that someone has guaranteed me." It has its own schema and it may or may not be the same scheme as the one it sends. Like, it could be, we'll talk about that later, but it doesn't necessarily have to be. It just says, "Whatever messages I get, they better conform to the schema and if they do, I'm good." They could have expectations about metadata as well. And you could also imagine that you might wanna have many different types of messages to feed different test cases or however that goes, right? Like, you might treat a markup and a markdown differently, so you might wanna be able to simulate that.

Okay. So, how, if we had all this in place that I'm hand waving for the moment, how could the consumer of a message safely make changes, right? Which means how can it make a change and know that it's not gonna be broken in production? Well, first, everyone has to agree on how we're defining guarantees. The producer of the message has to have some sort of test framework that will produce this artifact that I was talking about. And we need to have some sort of central authority that has all of the artifacts in it, right? So that when I'm writing the test, as the consumer, I can register my interest in this particular guarantee. And when I run my test, it will bring it down and it will feed it into my test case and make sure that I'm good. And if it passes, I can be pretty confident that I haven't broken myself by the messages that I know have been guaranteed are being sent.

So, the producer can also benefit from this. So the producer will never break because the producer is the authority of the messages. But the producer wants to know that they're not breaking anyone else. So, as we've mentioned, the consumer is grabbing this artifact and running a test. Well, the consumer could produce its own artifact of what happened, what it did, what it was expecting and what the results were. And it could publish its expectation up to the central authority. Now, when the producer makes the change and it's running its tests, well, it could go ask the central authority, "Who is expecting to get this message that I'm testing, that I'm sending?" And it could pull all of those down, evaluate those against itself to see if the change it's making is gonna break anybody that's expecting that message. Kinda makes sense? Yeah. Right. Cool.

So, the failure modes. There's one interesting one. There's two obvious ones but there's one interesting one. The interesting one is first, right? If I'm a consumer, I'm running a test and I'm expecting a message to be sent, and there is no guarantee in the central authority, that means I'll ship my code and it'll never be executed. I'll probably wanna know that before writing it. If the guarantee exists but my tests fail, that means when I go to production, I'm broken. Producer checks expectations. If they aren't compatible with what the producer's going to produce, then we know that at least one consumer is gonna be broken in production. So we could know these things in advance, in our CI system or we could prevent changes that cause these failures from happening.

So you could write...you could, maybe just decide that everybody's using the same schema everywhere and kind of enforce that. And it sort of feels a little simpler, right? Like, all consumers of a message must accept the schema that the producer is sending and kinda do it that way. But if you had a system, like I've described, there's some additional benefits which are, I think, kind of interesting. The central authority, it could listen to the actual production traffic. It could receive every message sent in production and do stuff with it like, perhaps a message comes in and no one is guaranteeing that that message will be sent. Could mean that we were lacking test coverage, could mean unauthorized access to the messaging system, like, that would be good to know. I mean, how could you ever know that now? That would be very hard to know now.

What if a consumer is expecting a message that is, in fact, guaranteed to be sent but, in reality, the message hasn't been sent for weeks, right? That would be really, really hard to detect because it basically means that your code just stopped running. And that can often be harder to know about than your code ran and broke. So this could detect that, right? "Hey, you want this message and it hasn't been sent for a long time. Maybe you wanna look into what happened." It could also document the inter-dependencies between all these systems, right? Of course, we want them decoupled, of course, we want them to be able to change. But, at any given moment in time, there does exist a mapping of who's sending what and who's consuming what. And this system could document that, and that would make it easier to understand how these business processes are actually implemented. If I need to make a change to the pack slip printing process, it's pretty hard to piece together that those pieces are involved in this unless you happen to build it. So this could help figure that out.

Okay. So, I'm gonna go one level deeper from what I did. I'm still hand waving, right? Because we're still imagining things. But how would the verification, that I hand-waved over, how would that actually work, right? So the guarantee I mentioned is a schema. So we could make sure that the messages that are getting sent are, like, you know, conform to that schema. Expectation also is schema and, as I mentioned, they could be the same. You could just decide that they're all the same but that has this, like, tight coupling, right? If you just assume everyone has to have the same schema, yeah, you could do that but you might not want that tight coupling, right? Perhaps the consumers only care about a couple of fields and some payloads, so why should they be expected to care about the ones that are extraneous to them? So, I was trying to find this as a concept. So if the guaranteed schema subsumes the expectation schema, right? The schema that's going out from the producer subsumes the schema that everyone's expecting to receive, then everything should be fine. I couldn't really find a formal definition of this and it might not be a real thing. But let's have an example.

So, here's a schema of an item price change event, right? So, we've got three fields, they're all required. I left that out, but they're all required by item ID, old price, new price. So our consumer doesn't care about the old price, the consumer just cares about the item ID and the new price. So here's the schema the consumer's expecting, right? So we can see by inspection the producer's schema subsumes this one, so therefore, any message that conforms to the producer's schema will conform to this. Meaning, the producer can add fields. Like here, the producer's adding the user ID of who initiated the price change. Well, the consumer doesn't care about the ID, they're just gonna ignore it anyway so this is totally safe to add, this still subsumes. But the producer could change the name of a field, right? It used to be called, "New price," and now it's called, "Updated price." Well, this no longer subsumes it and it will actually break the consumer because the consumer's relying on that field to be there. The consumer can break itself. The consumer can say, "Hey, I'm expecting this field called reason to show up in the message," but the producer never promised to send that. So the producer's schema does not subsume this one, so the consumer will break.

So, again, this is kinda hand wavy but we're getting closer to something real. We saw some brackets and colons so that's one step forward. Yeah. So, schemas are complex. Like, those are really simple ones and, like, the stuff you can do like in a JSON schema is very complicated. And I don't know that you could programmatically check the subsumption concepts. Like, I couldn't really find... I found a few people asking for it but no one had said that it could be done, so probably might not be a thing. Uniquely identifying these guarantees, right? Think about the systems you work on that have events or messages. Like, how many different ones are there? Can you think of a name for all of them that is comprehensible but doesn't introduce the weird coupling, right? So we could call it, "Styling app changes order items," but that's saying the styling app, like, we're not supposed to know that and we're not supposed to care about that. And the owners of the styling app might wanna send this message from somewhere else later. And so now they have this weird guarantee ID that has an app name that's not even sending it. That doesn't make any sense. You could too vague, right? Changes order items, like, "Is this the only place that's gonna change order items?" Like, maybe not. Unclear how to specify this.

And then you need to motivate the developers to actually write these tests. So, I talked about consumer-driven contracts for Synchronous services. We use a thing called, "Pact," and it has a way to write the test and it's pretty nice, but it's something. It's friction that the developers have to do and it's hard to enforce. You have to do it by code review. So whatever this system is, the developers have to be relatively incented to do it and it has to be not that hard or they're just not gonna do it. And, you know, this stuff has to be built. I've been imagining things this whole time. The system I'm imagining, like, it doesn't exist. I think it could exist.

So, I took some time on the plane over here. So, I live in Washington D.C. It's about a five-hour plane ride. So, I figured, "Can I hack this together in a way to see if this concept even works?" So, I set up some producers and consumers, right? So we've got this item price updater. It sends a message like that which says the price of this item is updated. So that connects to Rabbit and just sends a bunch of messages. The price cache is a consumer for whatever reason keeps a cache of the price of items. Maybe our finance team wants that. And then we have the pack slip updater which, you know, kinda works with the example that we've been talking about. So, I had this set up locally. Producing, consuming, all that's good.

So, I wanna write a test to do what I'm thinking and I'm really sorry this is Ruby and RSpec. I will explain the things that are relevant. This is what I do so it was the quickest thing. So, generally, we're saying, this is the thing that we're gonna initiate that we're gonna test. Our updater is updating the price of an item and then what should happen when that happens. Well, two things, one is very obvious. It should update the price of the item, who cares. But this is the interesting part here. So let's look at that. And, again, really sorry that it's Ruby.

So we're gonna expect that we got a message sent on the bus. This thing, Puka transmitter, that's like a API we use to talk to Rabbit. So this is saying like, "A message should have been sent when the code was executed." So this thing, the thing that implements this is gonna check to see if the payload that the code sent matched the schema. So the producer is providing the schema, we're gonna check against that. And they're also gonna check that the payload that was sent matches this, right? Because you wanna assert that some values got into the payload. And this is gonna produce that guarantee that we talked about. So, here's the schema that we've written that says, "We will conform to the schema." And then we run that test that I showed you and it outputs this. Don't worry about the tiny font. So this is, again, I made this up but this actually does work. So here's the ID of our guarantee. So this is how others can register their interest in this message. Here is the schema just copied from what we saw. And then here's an example payload that other people can use to feed their test if they want to. This is the payload that was actually produced from the test, so this is kind of as real as it gets without actually integrating.

So that exists, so what about the consumer? So, here's the consumer and I've bolded the thing that is important, but the consumer is doing things opposite, right? The consumer is saying, "I want something fed to me as input to my test." So that's what this first bolded thing is doing. It's saying, "Send me a message and I want... this is the one I'm guaranteeing." So it's gonna find from the central authority, which was my laptop, a guarantee with this ID. And then make sure that the payload that you're gonna send matches my schema, right? So the consumer is saying, "This is the schema I expect." So it'll check that and then write out an expectation with some metadata so we know kinda who's expecting this. So this kinda doesn't matter but turned out to be interesting and needed. And then we execute, you know, we execute our code and see that everything worked.

So this is the expectation, sorry the slide's wrong, expectation that is produced by that test, right? So we've got the ID of the guarantee that we are coupled to, so that's the same thing we saw in the other one. We've got our schema, and you'll notice we only have two fields because we only care about those two fields. We don't care about the third one. And then here's the payload that was generated and fed into our test. So we know that that payload got into our test and that we passed.

Okay. So the PackSlip Spec, right? Same consumer, slightly different. So it's overriding the sample payload that's gonna be sent into it, but the overriding payload is also gonna be checked against a schema. So if we were to write something in here that is invalid, it would fail the test. Good. So, this does work. Here's an animated GIF. So I'm running that. And so we can see that the producer correctly figured out that it sent the right thing and it knows these three consumers want its messages, and it checked and everything's good. The consumers similarly less exciting output but the consumers execute all that code that we just saw and verify everything that I just said. That's cool. So the proof is in the pudding though, let's break something.

So let's say we're working on a new feature and our consumer, and we wanna reason that item price was changed. This will break because we're not gonna be sent this. So when the producer goes to verify, it's gonna pull that expectation down, and low and behold, it detects properly that this consumer's gonna break. This consumer's expecting something from it that it is not giving. And so therefore, there's going to be a problem. And we get a decent explanation, courtesy of the Ruby stuff that generates this. And our test failed.

So, how real is this? I've shown you animated GIFs, I've hand-waved. There is a GitHub repo, you could go look at it. All that stuff is here, it actually does work. It just runs locally on your laptop. But it was enough for me to feel like this concept has potential and I think I want to use it, right? This seems really handy because it kinda does what, you know, what I was hand-waving that it would. So that's kinda cool. Fortunately, this is the last slide.

So that's all I have. Follow me on Twitter, come talk to me for a job or read my blog. And, thank you.

Subscribe to our newsletter to get the latest product updates, tips, and best practices!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.