#2 dltHub’s CEO Matthaus Krzykowski on Automating Data Pipelines for the AI Age Artwork

Cutting Edge AI

Cutting Edge AI is a podcast by Angel Invest Ventures, Europe’s most active super angel fund. Each episode examines how artificial intelligence is reshaping technology, business, and society from research breakthroughs to applied use cases. Hosts Jens Lapinski and Robin Harbort speak with founders, engineers, and investors who are building the next generation of AI products and infrastructure, offering clear insights into what’s real, what’s emerging, and what’s next. Stay one step ahead of the curve on the journey to the next generation of AI.

All Episodes

Cutting Edge AI

#2 dltHub’s CEO Matthaus Krzykowski on Automating Data Pipelines for the AI Age

October 27, 2025 • Podcasts from the Angel Invest Team • Season 1 • Episode 2

AI is changing how data moves. In this episode, we sit down with Matthaus Krzykowski, co-founder and CEO of dltHub, the company turning data pipelines into a native language for large language models. dltHub automates up to 95 percent of data-engineering workflows and is already used by thousands of developers and over 4,000 companies. Its open-source roots and recent selection for the OpenAI Startup Program reveal how quickly AI infrastructure is evolving.

Matthaus walks us through the rise of vibe coding, the launch of dltHub’s marketplace with more than a thousand AI-first code connectors, and what happens when agent workflows start running in production. We also unpack how MCP brings natural-language interaction to company data, why Python remains the connective tissue of AI systems, and how auditability and data quality are becoming the real frontier for automation.

A conversation about infrastructure catching up with intelligence—and the new rules of building in an autonomous world.

Matthaus Krzykowski: [00:00:00] So like there's all kinds of data engineering workloads, which were automated. We essentially automated 1990 5% of nine data engineering workflows.

This is Cutting Edge AI brought to you by Angel Invest with your hosts, Jens Lapinski and Robin Harbort.

Robin Harbort: Our guest today is Matthaus. He's the co-founder and CEO of dlt Hub. DLT is disrupting the market for paid data source connectors with its cutting edge approach. He caught the attention of OpenAI and as a result, DLT was selected to be part of the latest cohort of the OpenAI Startups program.

We will talk about the rise of vibe coding, what comes of the AI agents. Our data moves in autonomous organizations and why DLT is cutting edge. This is the Cutting Edge podcast where we talk about people doing cutting edge work in ai. My name is Robin from Angel Invest. Let's go. [00:01:00]

Hey, Matthaus, welcome to Cutting Edge. Great to have you here.

Matthaus Krzykowski : Thanks for having me, Robin . Jens,

Robin Harbort: Jens also welcome. Ours today is, uh, co-host .

Jens Lapinski: What brought you here?

Matthaus Krzykowski : I think you guys asked me whether I want to join this podcast and I think like the, what you were curious to hear is a little bit of a story of what I'm doing as a CEO co-founder of DLD Hub, who I got here, and how I think AI is a particular impacting what we're doing and how we're trying to shape it.

Jens Lapinski: So why don't we start at the beginning. Tell us through, you know, at some point you're in school, you grow up, and then you become the CEO of a tech company. Talk us through briefly that, that journey, how did you make it from your school days to now doing this? How did that happen?

Matthaus Krzykowski : You know, like retroactively, you actually, you see the path while you go over path.

You're actually blind. I think if [00:02:00] the key moments are probably, I'm a Polish immigrant kid to Germany. I still got separated by the communists for a year from my parents when I was little, when I grew up in West Germany. And then I was a little bit unruly and when I was 15 I decided to go a little bit independent and I got myself into high school in Southern Africa.

I think I was a little bit onto a for tour, right? In this sense, when I came back, at the time I thought I'm a journalist, but very quickly I decided, figured out that I'm not a journalist and I went through very quick iterations of different jobs of, of what I don't want to be and where I don't fit in.

And it was clear at some point that actually I would be much better as a manager of creative people. When I try to figure out what do I need to study for that and I ask people and I try to find entrepreneurs like, and then I got the advice that I need to study business and law and [00:03:00] some technology and some maybe art.

And that's what I did. I finished like a business degree. I have done some mechanical engineering. I've done history of art and architecture, a couple of things in mechanical engineering. And very quickly, my first job was actually trying to sell art because that was the most entrepreneurial person I've known at the time.

It became very clear to me that art is not a good world to be in. When I applied in 2006 to eight different startups, including outside, and I got a job with spreadsheet and LA. And, and, and suddenly I was a product manager in e-commerce and within two, three years I started to do my thing, went to the States for a while early mobile days, and fast forward to today.

I'm the CEO of Co-founder of dlt Hub. Many journeys. This is my first VC back startup. To answer your question, YI think the early steps are much more important, I think, than the later [00:04:00] ones. You know, like I think, you know, when I'm now sometimes mentoring kids, these early steps are much more important than the later ones.

Jens Lapinski: Because there is a much more natural path.

So three startups, this is the third one. How did that start? Because it wasn't the case that you said, oh I've had this idea. Let's do this. There was much more of a searching activity before this, right?

Matthaus Krzykowski : That's correct. Right. Like, you know, when you embark on this journey, you need to know that you are doing it for 10 years or plus.

And so you need to really have energy and really, really love it. So one of my past co-founders and I were really throwing spaghetti and doing demos every month. And one of these demos, that was like five years ago, we got early access to GPT three and we did a prompting demo back then. And we showed it to the Raza team.

That's AI agents before it's cool. And it was a Friday. On Monday if they asked us whether we want to build data pipelines for, for the customers. At the time, Raza had [00:05:00] 15% Fortune 500 using them, and suddenly we were building. Data pipelines. And that's where we discovered a problem, which is that you had these 20 something python scripters, young generations of coders, uh, who were supposed to get machine learning AI agents done, and they were fighting plus 35, 40 year olds people on a much older technology.

And what's is usually a boring. Panel on a boring conference, which is Python versus SQL became a what if the bug is actually the feature and what, what if there's something here? And then there were many more inflection points before we decided this is a real startup. We, you know, we saw the numbers at, in 2017 there were 7 million of these Python developers.

Now it's 21 million. So that's like the meta trend. Python is the language of ai. When we shipped one of these pipelines, like it's like to a serious c Sequoia backed FinTech, they told us after a year that they made $40 million [00:06:00] on top of our infra, and you saw real. AI agents making a real difference and then became clear to us, okay, you know, there's something brewing here.

Why don't we do a data movement startup? And while there was plus 300 venture backed startups in the ETL space, there was none, which is really, truly done for Python. And why don't we do that for this new group of users? 'cause if we succeed, good things can happen.

Jens Lapinski: Do you wanna maybe give an example of what a data pipeline is when it's in production in the company so that everybody here can understand exactly what you mean with data pipeline?

Matthaus Krzykowski : That's a great question. So in a typical small startup, that can be anything from your database, which your app or your website or your product runs on. That can be like a Postgres or Mongo or whatever you use. It could be your advertising data, like Facebook or Google Ads. It could be your [00:07:00] CRM, like Salesforce or HubSpots like, the initial sets of data, which the company needs.

Now I think in this new Pi, Python world, it's not only like 20 or 50 different sources in the Python world, there's actually 10 thousands, a hundred thousands, and millions of these. If you look at the total market of APIs versus millions of them. There's millions of different types of data sets out there.

So one of the quick discoveries for us was actually that the, this old technology caters to actually, to a very, very small market and to a very limited amount of data pipelines and data pipelines. It's not only like let's fetch the data once, but then it becomes commercially interesting when a company decides that this is important data for our business and we need to run it daily, multiple times weekly.

Because when people start paying you for runtime, and that's like a traditional business model in my world, like, like can you get [00:08:00] to being paid for runtime?

Robin Harbort: Matthaus, your call to action is vibe code, any data source, can you explain to me what it means?

Matthaus Krzykowski : One of the bets we took very early on was that if we make data movement in Python code and not like in GOI or SAS products, which are behind a wall, is that at some point the LMS may understand us.

Like we, we did our first prototype around that. I think two years ago we changed even the library that the models like, like Chet would understand our Python library. And in September last year we saw how the all free preview index dity, we had prepared for it. We already like in. Traditional startup line, you have search engine optimization, and we prepared already thousands of pages not optimized for search engines for lms.[00:09:00]

So it was, instead of images and a little bit text, we put a load of code in it with. So when there's a task, let's say, as I said before, from stripe to your database, that an LLM finds very necessary information to be able to execute the task. My vision for the company is, is that we want to be the big standard for data loading.

We want everyone to use DLT, and I want hundred, thousands, and millions of companies using DLT and our path to get there is to make it super simple and super powerful. We think that the best way to do it is if we're part of these agent workflows and the path to get there is that people are able to vibe, code any source into something useful, which runs and has a big impact on their business.

Robin Harbort: Do you know how many vibe coders currently use the code from DLT?

Matthaus Krzykowski : I don't know the number of vibe coders, but I, I [00:10:00] have some numbers, right? Like it's really early ages. The first model which really truly got DLT, where, where we fought like this makes sense. It's clots on it and this like. Six, seven months old now, and we build like LLM analytics in-house, which has been going for us for two months.

And we saw in May that we saw over 135,000 LLM requests. And we saw like one of our key metrics is like how many of these pipelines get built by our community? And in January it was 2,400, and suddenly in May we had to jump to over 35,000. 'cause we see also like which AI docs were hit by the LM requests.

We see like, like a huge boom of it. How many people? I think it's now already thousands.

Jens Lapinski: I think on your website it says that there are 1.7 billion downloads per month.

Matthaus Krzykowski : So Right. Not every download is like one person. Some a large [00:11:00] corporation could be 50,000 downloads. We cross now 4,000 companies using us in productions.

And if, if I would do numbers of how many individual developers use DLT and MF and it's 10,000.

Jens Lapinski: So the idea behind dlt Hub is that it is a way to send data from applications. So you had different applications in production, continuously back and forth. All the time. And in addition to that, you are enabling AI to understand the data that's being transferred.

Matthaus Krzykowski : I think that's That's right on spot. Right. And the value of it is like with a Python script, you can do it once. Right? You know, you said like we have 1.8 million downloads per month. Like there a Python library called requests and.

But the Python request library, what it does, it allows you to fetch the data once. The trick [00:12:00] is, is that like, because I, APIs change and people maybe don't want to fetch all the data ever from zero again. So like there's all kinds of data engineering workloads, which were automated. We essentially automated 90, 95% of nine data engineering workflows.

So when a company builds a DLT pipeline and keeps on running, it's just like a great, great experience. And what's new is, as you said, and that's the second point, is suddenly in the last six nine months you have AI first adoption. And this changes the game because now suddenly, like, like these data engineers and companies, if you talk to founders, there's only a hundred thousands of these people worldwide.

Only couple thousands get added to the labor pool and they are super expensive. They are, they, they're really hard to get. And when you know, they, they real brain x. And the promise with this move is now that [00:13:00] suddenly you can give DT to any Python script or Python person. And suddenly you empower everyone in your company to fetch data, which is key to make your business more productive.

Right? Right now, the, the limit of cursor and alvi and OpenAI is not like, even like how the models, the limit is much more on which data can clean data, structured data with these models, can work with, we can interact with, and we're cleaning up the data and make it easy for, for this LM to interact with this data.

Jens Lapinski: So now the podcast is called Cutting Edge, so you're about to release something soon. Actually, by the time we release this podcast, you will have already released it. What's the cutting edge thing that, that you're about to, uh, or that you've just published to the world

Matthaus Krzykowski : So far you know, we're known for library, dlt, and people have been wondering why we call the dlt hub for a long time.

And so we're releasing DLT Hub [00:14:00] itself and in its initial form, it's a marketplace with over 1000 AI first code connectors. And we think we can, we can get the number of these AI first code connectors to couple thousand. We want to be the front runner, which allows this software free zero to think where suddenly everyone can prompt and make like thousands and 10 thousands of dataset accessible to any general Python developer.

Jens Lapinski: How does that relate to MCPs? How does that plug in? How does that play with that?

Matthaus Krzykowski : MCP is a neighboring technology. For us, like, like with a Python script, you can fetch the data. It solves all these data engineering issues that the data keeps on running, that it's clean and smooth. With MCP, you can write a single connection and, but with the real strength for MCP is that you can now talk to the data inside.

So for us, like as we expand on building what we're building, [00:15:00] there will be a notebook experience that's once you fetch. Any data out there, you'll be able to interact through our commercial platform to any of these data sets through, through natural language. So it's a neighboring technology for us.

Jens Lapinski: So can you give us an an example of what that would look like?

If I have an organization, I run both. What would I do with one? What would I do with the other? How would they relate? What does that look like in the future?

Matthaus Krzykowski : So maybe I'll give a real example. So there is like, like for example, a code friend work called Goose, which is built by a company called Block and Square.

And it, it's essentially like a copilot competitor frame framework. And what they did is they, for example, they connected 30 of their most popular, uh, SaaS sources, like from notion to Google Drive to anything. And they have applied plus nine, 9,000 people on it. Not only coders, but also known quarters.

People in finance interact with it. Whenever something's changes in these, somebody will have to [00:16:00] fix it. If somebody wants to add the source, which is not in the framework, it's pretty hard with DLT, any of these people will be able to fetch any data source within 10 minutes and go from data source to something, something available built in a notebook app.

Then with MCP on top of that, you'll be able to interact with it. So it's not either or. You can use both technology in parallel, and they're actually accelerating each other's utility

Jens Lapinski: because one's providing the data and the other one is using it in a certain way.

Matthaus Krzykowski : You can still build like a, you know, old school connectors for MCP, but what we're doing like just before is raising the bar of what's possible, essentially.

Like imagine y you have Robin at Angel Invest and you can tell Robin, Hey, I, I want something useful on top of the data. And Robin will be able to come back with you with an [00:17:00] analysis in 15, 20 minutes. And if you like it, because it's let's say maybe it's, like it's at your data from Angel Invest and oh, this, I like that.

I want to keep running. This is the value. We'll, we're going from. To, into an era of ad hoc code, right? Where you can, just like you have, these frontend tools like lovable or bold, right? Like where people can essentially mock up and create like an front end app.

Within minutes will, will have this on, on the data engineering and the data side as well.

Robin Harbort: Can you tell me more about this Goose tools? So you mentioned that all enterprise application is basically connected. Does that mean that the users of Goose only use Goose? For everything they want to do.

Matthaus Krzykowski : So, you know, like imagine you're a normal company and you have like all your tools, like your spreadsheet tools like Google Drive, your notion, your Jira, and so forth.

You know, I, I was listening to a presentation [00:18:00] from people from Block about goods. In San Francisco three weeks ago, and the way they narrated the examples was essentially like instead of having a chat PT outside, everyone has uses Goose And think of it as a, as an interface to interact with these data.

And it's the data of your notion of your Google Drive and it's tied to your company data. If I want to create something interesting, which combines these data sets, right? Because that's very powerful. You can go do it through Goose. You can do really interesting reports on it, which connect with different data sets.

And so at this company, they were saying is like most people. Actually prefer to interact through goods with these tools and not through the graphic user interfaces of these tools.

Robin Harbort: We've also talked about vibe coding and the dlt massively, or like growing because of the emerge of vibe coding. What's also a hot topic is AI [00:19:00] agents.

What do you think comes after AI agents, or what is the next evolution step?

Matthaus Krzykowski : I was recently in San Francisco and there was like a keynote at the Databricks Show where the CEO of Databricks and the CEO of Tropic were talking about it. And we were referring to framework from a person called Steve Y from a post called Revenge of a Junior Developer.

There was vision that we go from. From an age where, you know, we had things like how to complete like a year ago to now we have vibe coding that the next thing is coming will be coming something called agent Fleets. And what we just described a little bit with Goose is, is this next iteration right now you, you can interact through different agents and your productivity is not only handling one agent, you are becoming manager of multiple agents.

Like, and the next [00:20:00] version would be is that essentially like, like the company runs, whole company runs on many, many agents.

Jens Lapinski: Yes.

Matthaus Krzykowski : Um, and, and there's like other competing visions, but essentially the people are not only thinking about them, but there's many organizations I don't know in, in New York RAMP is very forward looking.

We're actually implementing a lot of these things already today.

Jens Lapinski: I are gonna give a presentation next month. This is to the board of a larger organization in Germany, and one of the points is that the way in which software will work in the future is not how software has actually worked in the past.

So in the past is that there is some code that's running on a machine and the humans who sit there and, and operate the, the machine and the code, and they, they use the software programs, we've now already transitioned to a situation where. The, in a sense that, you know, the software is alive say it's doing [00:21:00] things in a much more human-like manner.

In a manner that's obviously a similar related to that. It's got lots of consequences, but one of them is that. We no longer need to tell software what to do. In a sense, software can tell itself what it needs to do and it, and it autonomously does that. If you think about that, this autonomous, like have you thought about how AI could use DLT or similar connectors or you know, something?

Basically the software wants to access all sorts of other data and all sorts of other tools and interact with that. It's gonna be one of the critical components of the entire AI stack. If you think about it, when you think about, because right now is you, you're giving software to humans to build connectors between different software applications, if you like.

Yeah. And run that in production. Have there already been instances where the AI has said, Hey, I would like to have a connector to talk to [00:22:00] this, and then the AI made it so that the AI could pull the data and then use that. Has that already happened or is that future,

Matthaus Krzykowski : These workflows exist already today.

We've seen this happening with DLT already, like, like for. I think it's, it's not only with like, even the starters, right? Like what, what was interesting, like, like while I was in San Francisco, you know, at this Databricks conference and Databricks is, is very forward looking. They already had like plus two and a half thousand large corporations including MasterCard on some of, not this workflow, but some.

Just workflows before, you know, like there was a keynote talk with, with a MasterCard co. And he was talking how a lot of these type of early workflows are changing. I think they had like thousands of, of, of agent workflows as MasterCard already. So it's not future anymore. It, these things are happening.

And what you just described is, is like the next step that actually, you know, like [00:23:00] reasoning models in, in where or i ideas where essentially you not only decide to talk, but, but you also formulate the actions. It's technology, which already like open eye pushes to people. It's, it

Robin Harbort: exists today. Talking about OpenAI, you mentioned that your, your cutting edge approach connecting data sources also got the attention of OpenAI, and I heard that you visit them in New York, so is that correct?

Yeah, so

Matthaus Krzykowski : I think formally, we're startup based in Germany with age quarters, but we're legally in New York and there is something called the OpenAI Startup Program, and we got nominated to it. It doesn't exist in Europe, but it exists in San Francisco and New York, and there was the OpenAI Startup Day we're part of, of multiple activities with them where they work with us on using or OpenAI tooling on getting things done.[00:24:00]

Robin Harbort: Do you know why OpenAI is doing this? Why are they collaborating and working so closely together with emerging startups?

Matthaus Krzykowski : OpenAI is organized in, in three divisions. Like they, they have like research, they have, uh, foundations, and then they have applied. And in this applied division, they essentially have all kinds of tooling, which is about like, how do you get AI done?

And with us it's, you know, like the idea is essentially they want to get the next cursor, they want to get the next 11 labs, which uses the API and tries essentially for revenue. What they were doing with us specifically, there were essentially free tracks and they tried to. One was about the latest models.

Then there was like another one around agentic tooling. So that was the both like the agentic framework as well as response, API. The response API is [00:25:00] what Y just mentioned. It's not only about like what listening, but like can you get the agent actually to do actions. Then there was another one around fine tuning distillation.

So fine tuning is like, can you get your, like any of the open AI models to understand what you're doing. So in our case, maybe DOT and data engineering, but there were other companies in the batch who were doing legal work or sales work, and you can work with them on fine tuning the models to what you're doing.

And distilling is, is you know, like until couple years, building action model or fine tuning model was really, really costly. Now you can actually work with them to come up with much smaller models and custom models for your dataset, right? You up upload your dataset and they work with you on making sure that the models work really well with what you're doing.

Robin Harbort: Okay. Matthaus, what do you think is dlt already and autonomous organization [00:26:00] or, or close to it?

Matthaus Krzykowski : The term autonomous organization has been going on for a long time, right? Like there, like companies like Lemonade, which is like a FinTech insurance. FinTech in, in the US we have 30 agents running and we're only 13 people, but we're right now serving plus 4,000 open source organizations using DLT.

So without us being autonomous to some degree, we would not be this. So we're automating like every. Think we can, uh, while keeping the loop. And what's interesting is often with if these agent workflows, Robin, people are more honest. Like, like when, when humans talk to you, they are very smart. They try to give you social proving that they appear smart.

When we have all these agents running, their people are way more dumber and asking the true questions and, and they voicing the true problems they have with you and your software. [00:27:00] Uh, we think that we we're doing this as a huge advantage for us. Like the way, like our insight into user behavior is, is so much better through actually the, our agent workflows than even human interactions.

Robin Harbort: So currently AI at DLT is building the data connectors for some kind of other AI?

Matthaus Krzykowski : Yes and no. There's like, like a conundrum here, Robin. In a way, this AI code world and the group actually creates new problems. Suddenly there will be not only couple thousands experts, but essentially millions of generalists getting superpowers, and we will not know much about it.

When AI works, it's great, but what, when it doesn't work, when it becomes a problem and the number of these problems will explode. So the interesting thing about I think this new age. Data quality, the understanding of like being able to inspect what, what is able to fix things [00:28:00] will become a much, much bigger problem.

And I think also an an opportunity to make loads of money when we talk to financial organizations, to help organizations, especially, you know, these get outed. They need to have like, like a real log of, of decisions taken by ai. So, so there's like, like, like, like in a very interesting space where actually there's new opportunities rising up and they need to control the risks around that.

That's correct.

Jens Lapinski: Uh manage to do risk management in the age of ai, which is very difficult to do.

Matthaus Krzykowski : That's correct. And we, we need also like, like prove that, for example, AI was not involved in certain tasks. And the log of it, like the systems need to be become much more auditable and end to end.

So, so it's, it's all kinds of interesting things happening here. So it's not only about AI doing this actually like, like making sure that actually it's no AI voice involved sometimes, or that it's a quality check is becoming actually much more important.

Jens Lapinski: What do you think will be [00:29:00] cutting edge in AI next year?

Matthaus Krzykowski : I really don't know. I've been in the early mobile boom. When, when, when the iPhone happened to me. This age feels the same. Like for me, like the last four weeks of my life have been the craziest four weeks of professional life. Like, like things happening every week, two weeks. And I tell people, my company, let's go left instead of right.

I'm pretty sure that like in 10 years I will be thinking of this summer of 25 thinking I was part of a dotcom boom. I don't know what, what, what this is, but this is like, I'm pretty sure this is what's, what's happening. I have some long-term bets, which I prefer not to talk about in my space, but like it's such an exciting time that like every week, every day something new is happening.

Cool. Thanks for being here. Thanks Jens and Robin. Thank you.

Robin Harbort: Thanks for being here. If you enjoyed this episode, support us, leave a follow and share the Cutting Edge AI podcast. See you next [00:30:00] time.