We have launched
Hi there! This is Dan - Batch co-founder and CTO.
I am super excited to present you with the result of months of ridiculously hard work - the self-serve Batch platform!
The big-data world is filled with crazy vocabulary and marketing lingo that requires a PhD to decipher what a company actually does.. so, I'm just going to try to describe us in plain terms.
Batch is a message-bus agnostic observability and message replay platform. Batch plugs into your message-bus (Kafka, Rabbit, SQS, MQTT, whatever), analyzes and indexes each message in real time and allows you to search through them using Lucene syntax. Once you've found the events you care about - you can replay them to whatever destination you want - be it another Kafka instance or an HTTP API.
Saving the best for last - all events have their schema automatically inferred - as in, you do not have to define a "table schema" as you would in most other data platforms - we discover the schema on-the-fly AND write the data to an S3 bucket of your choice in an optimized parquet format.
Batch is something truly new - we support virtually all message bus tech, all encoding types and provide you with all of the necessary tools to inspect, replay and route your data to wherever you like.
We've seen it a hundred times over and have personally dealt with this time and time again - message buses are black boxes. It is not terribly difficult to get data into them but it is significantly more difficult to know what is actually inside them.
Did the message that you published have all the correct fields? What fields are filled out by other services publishing to this message queue? Is this the right message envelope to use for this event type? How many of these messages have been published to the message bus?
Of course, with a bit of elbow grease, it is possible to get this info. You write some throw-away publisher code, some consumer code and hopefully get a glimpse into what's happening right now on your message bus. The difficulty of this step varies quite a bit between different message buses but the general sentiment is the same - it's a pain.
And if you are working on complex event-driven systems that utilize event sourcing, you now have to figure out:
- How to store events
- How to search events
- How to replay events
The problem is further exacerbated when you are working within an event-driven architecture. If you have 10+ services emitting and consuming different messages - it is that much harder to pinpoint the exact messages you are interested in.
Rather than reinventing the wheel - we decided to build Batch to address all of these problems.
How are we addressing the problem?
When building highly distributed systems, you have likely explored the possibility of going event driven. But then you read some more docs and articles and realized that it's a serious amount of work... and maybe you don't have the time for all that jazz. And it's true - a good event driven strategy is a lot of work. You need a lot of pieces before the puzzle starts making sense.
Having been down this road before, one of our primary goals with Batch was to try and make event driven and event sourcing more accessible to everyone. Batch achieves this goal by offering the following functionality:
Batch eliminates the need to write any throw-away code - we have an indexed copy of every single message that has ever passed through your message bus. Any engineer - be it a data scientist or a backend developer can have immediate access and visibility into your data stream.
You no longer need to have a runbook explaining how to write a sample Kafka consumer (and set the topic to
Foo and set the offset to
48271 and connect with these TLS settings and download this certificate bundle and ... you get the idea).
Log into our dashboard, select the collection that contains your events and search for whatever snippet of data you are looking for.
Batch stores all of your data, forever. We store it in two locations:
- Hot - to facilitate ultra-fast search
- Cold - to facilitate message replays
Our hot storage resides almost entirely in memory and represents the latest data we've collected (<6 months), while cold storage contains ALL of your message bus data.
The cold storage resides in S3, stored in standard parquet format, using a schema we inferred at collection time. Best part - we can store this data in an S3 bucket of your choice and you can use the organized data for whatever purposes you see fit.
In other words, if you've ever thought about creating a datalake - Batch is a fantastic way to hydrate it without having to write a single line of code.
We're told that data scientists love this feature 😊
- Where do you store the events?
- How do you store events... forever?
- In what format?
- How do you query the events?
- How do you perform the actual replay?
- How do you maintain these components?
Batch's replay functionality addresses all of these concerns and more.
From the very beginning of Batch, "replay" has been at the core of what we do. Our replay mechanism leverages our storage tech, our search tech and our ability to talk to virtually any message bus.
Using Batch's replay enables you to make use of a really powerful feature right away - without having to build anything. No frameworks, no custom libraries, no custom code.
Who is Batch?
Batch was started by me (Dan) and Ustin - we are data nerds who have worked with message tech for a really long time and noticed that we kept having to build similar systems over and over throughout our career.
Batch was built over the course of a year by a team of industry-hardened engineers - all with a common interest in messaging systems and event driven technology.
These are the excellent folks who built Batch:
Why did you build Batch?
We built Batch because nothing like it exists. There are lots of vendors that are geared towards a specific messaging technology but none of them are able to "talk" to all of them.
In our experience, messaging tech can be intimidating - our goal was to create something that makes building distributed systems easier, faster and more accessible to everyone involved.
We are focusing 100% of our efforts towards self-serve for the foreseeable future.
In the next 3-6 months, we plan to:
- Allow you to launch hosted plumber instances on your choice of cloud provider
- plumber is our OSS tool that we use for pumping messages from your message bus to Batch
- Currently, to pump data into Batch, you must launch a plumber instance on your infrastructure. This is OK but we realize that not everyone wants to run and manage additional components in their systems.
- Launch message collectors in multiple regions
- This will improve latency for those who aren't near us-west-2
- Listen to you on how we can improve the product
- Tell us what is great, tell us what needs improvement and tell us what you need to make your life easier
I am seriously beyond delighted to be able to write this - we have launched and we cannot wait for you to try our platform.
What we set out to build was pretty ambitious and I am proud to say that we managed to pull it off. We hope to provide you with an amazing, trustworthy and reliable experience for years to come.