Top Data Engineering Startups 2023

December 2023

Browse 75 of the top Data Engineering startups funded by Y Combinator.

We also have a Startup Directory where you can search through over 4,000 companies.

  • Fivetran
    Fivetran (W13)Active • 1,200 employees • Oakland, CA
    Fivetran automates data movement out of, into and across cloud data platforms. We automate the most time-consuming parts of the ELT process from extracts to schema drift handling to transformations, so data engineers can focus on higher-impact projects with total pipeline peace of mind. With 99.9% uptime and self-healing pipelines, Fivetran enables hundreds of leading brands across the globe, including Autodesk, Conagra Brands, JetBlue, Lionsgate, Morgan Stanley, and Ziff Davis, to accelerate data-driven decisions and drive business growth. Fivetran is headquartered in Oakland, California, with offices around the world. 
    saas
    b2b
    analytics
    data-engineering
  • Airbyte
    Airbyte (W20)Active • 110 employees • San Francisco
    Airbyte is the leading open-source ELT platform that replicates data from applications, APIs & databases to data warehouses, data lakes, and other destinations. https://github.com/airbytehq/airbyte
    developer-tools
    open-source
    data-engineering
  • Supabase
    Supabase (S20)Active • 70 employees • Singapore
    Supabase is the easiest way to get started with Postgres. Each project within Supabase is an isolated Postgres cluster, allowing customers to scale independently, while still providing the features that you need to build: instant database setup, auth, row level security, realtime data streams, auto-generating APIs, and a simple to use web interface. We are 100% remote.
    developer-tools
    open-source
    big-data
    data-engineering
    databases
  • TRM Labs
    TRM Labs (S19)Active • 180 employees • San Francisco
    At TRM, we're on a mission to build trust in digital assets, because the promise of crypto is too valuable to be impeded by bad actors. We provide a blockchain intelligence platform to law enforcement, financial institutions, and crypto firms to assist in the detection and prevention of cryptocurrency fraud and financial crime. Our vision is to build a company that can sustainably deliver on our mission for decades to come, enabling consumers to transact safely and securely on the blockchain. Join our mission ➔ www.trmlabs.com/careers
    fintech
    machine-learning
    crypto-web3
    data-engineering
  • Gecko Robotics
    Gecko Robotics (W16)Active • 230 employees • Austin, TX
    The mission of Gecko Robotics is to improve the state of the world by helping the most important institutions ensure the availability, reliability and sustainability of critical infrastructure. Gecko's combination of wall-climbing robots, industry-leading sensors, and an AI-powered data platform give customers a unique window into the health of their physical assets allowing real-time decisions that prevent power outages, ensure military missions succeed, and help reduce energy costs.
    robotics
    energy
    big-data
    data-engineering
    ai
  • Mezmo
    Mezmo (W15)Active • 172 employees • San Jose, CA
    Mezmo, formerly LogDNA, is an observability platform to manage and take action on your data. It ingests, processes, and routes log data to fuel enterprise-level application development and delivery, security, and compliance use cases. Mezmo was brought to life by three-time co-founders Chris Nguyen and Lee Liu and included in the Winter 2015 batch of Y Combinator. In 2018 the company partnered with tech giant, IBM, to become the sole logging provider for IBM Cloud. Mezmo is on a mission to empower people who build solutions that shape the world. We’re doing this by delivering a platform that enables enterprises to get more value from their observability data in real time, regardless of source, destination, use case, or scale. We’re not the only ones working on this problem but we have a few things the others don’t. We’re cloud-native and know how to make the most of modern technology like Kubernetes. We have scaled a solution from zero to petabyte scale in a short amount of time, while supporting thousands of active users across multiple environments. We are hungry for change and are surrounded by enterprises telling us they’re hungry, too. We have a kick-ass group of people who are thinking about the problem analytically and are excited to change the observability world for the better. Mezmo has helped some of the world’s most innovative companies transform how they manage their systems and applications. Still, we know that we can help them get more value from their observability data by providing more flexibility and control over how they use it. This will enable teams to spend less time switching between data silos so they can focus on shipping better, more resilient, and secure products. We have momentum on our side. Last year we saw triple digit revenue growth and added 800 new customers to our roster. Recent accolades include being named to YC’s Top Companies, CRN’s 10 Hottest DevOps Startups, and EMA’s Top 3 Observability Platforms.
    developer-tools
    devsecops
    saas
    kubernetes
    data-engineering
  • Spruce Systems
    Spruce Systems (W21)Active • 25 employees • New York
    Spruce lets users control their data across the web. We believe that the world is evolving toward one based on cryptography, networks, and digital economies that are user-controlled. Today, the dominant use case for user keys is the signing of blockchain transactions, but we think this barely scratches the surface of what is possible. Soon, the entirety of a user’s digital interactions will be based on their keypairs, and we’re unlocking this transition with our constellation of products. We are passionate about cultivating a thriving culture of diverse individuals who bring unique perspectives to our mission. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.
    crypto-web3
    identity
    open-source
    privacy
    data-engineering
  • MovingLake
    MovingLake (S22)Active • 3 employees • Mexico City, Mexico
    MovingLake is Fivetran for event-driven architectures. Companies such as Casai use our product to obtain orders and price changes in real time.
    saas
    b2b
    analytics
    api
    data-engineering
  • Sunpia
    Sunpia (S22)Active • 3 employees • San Jose, CA
    Sunpia lets developers easily experience the cost and speed benefits of serverless infrastructure, without having to rewrite their code. Developers annotate their code and Sunpia automatically designs a microservice version of it they can deploy on their own cloud.
    developer-tools
    kubernetes
    data-engineering
  • Findly
    Findly (S22)Active • 6 employees
    Findly.ai is the ChatGPT for Google Analytics that revolutionizes how businesses understand and interact with their data. By creating an engaging chat environment, it empowers decision-makers to gain insights, request reports, and generate visualizations based on their company's metrics. This seamless interaction is made possible by integrating a metric layer that comprehends all your company's metrics. The chat-based exploration simplifies complex data analysis, allowing users to generate comprehensive summaries with a single click, which can be exported to various formats. Furthermore, with the introduction of scheduled chats and action-triggered automations, Findly.ai enhances the autonomy and efficiency of decision-makers. It's more than a tool; it's a decision-making operational system aiming to facilitate decision-makers in achieving their KPIs while spending less time waiting for data.
    generative-ai
    b2b
    chatbot
    data-engineering
    ai
  • Lamin
    Lamin (S22)Active • 4 employees • Munich, Germany
    Manage data & analyses with an open-source Python framework. Collaborate across dry and wetlab in a distributed data hub. Get started on your laptop and deploy anywhere.
    developer-tools
    machine-learning
    biotech
    open-source
    data-engineering
  • Grai
    Grai (S22)Active • 3 employees • San Francisco
    Grai is open source version control for metadata. We can determine how database changes will affect deployed machine learning models, apis, and dashboards because we understand how data relates across systems which don’t otherwise talk to each other.
    developer-tools
    saas
    analytics
    open-source
    data-engineering
  • Bracket
    Bracket (W22)Active • 3 employees • New York
    Bracket is the two-way data pipeline between popular business tools and backend databases. When ops teams update data in Salesforce or Airtable, and engineers update data in the database, Bracket connects the two sources to reflect the same information.
    saas
    b2b
    data-engineering
  • Trackingplan
    Trackingplan (W22)Active • 8 employees • Barcelona, Spain
    Trackingplan automatically discovers and monitors all the information your applications and websites are collecting, ensuring that you can trust your BI, analytics, marketing, and sales tools. You can think of us as Segment Protocols but totally transparent, where developers can keep using Google Analytics, Amplitude, Hubspot, Intercom, Braze, etc. as they are used to. Installed in minutes in using your Tag Manager or adding just one line of code to your web or apps, we model all the data being sent to third parties. Since Trackingplan understands what each piece of data means, it identifies patterns, detects anomalies, and automatically connects the dots to create value from data that was hidden in plain sight: - An always up-to-date single source of truth and data governance tool. To discover, understand and document your data and improve communication across teams. - Automated notifications when something breaks or changes. To make sure that integrations are always well implemented: Schema errors, traffic anomalies, rogue events... - Easy to understand, customizable, cross-service alerts. To detect trends, insights, and problems without using complex, engineer-oriented solutions.
    saas
    analytics
    data-engineering
  • Hydra
    Hydra (W22)Active • 7 employees • San Francisco
    Open source Snowflake alternative. Query billions of rows instantly on column-oriented Postgres. Hydra can be used as open source, managed cloud, or deployable in customer cloud infrastructure. Get parallelized analytics in minutes with no code changes
    developer-tools
    analytics
    open-source
    data-engineering
  • LanceDB
    LanceDB (W22)Active • 4 employees • San Francisco
    LanceDB is a new open-source vector database that can support low-latency billion-scale vector search on a single node. Built around a new columnar data format, LanceDB makes it incredibly easy to build applications for generative AI, recsys, search engines, content moderation, and more.
    aiops
    machine-learning
    data-engineering
  • Elementary
    Elementary (W22)Active • 2 employees • Tel Aviv-Yafo, Israel
    Elementary enables data teams to detect problems in their data before their users do. An open-source solution that any data engineer can deploy in minutes without sharing sensitive data.
    developer-tools
    analytics
    open-source
    data-engineering
  • DynamoFL
    DynamoFL (W22)Active • 20 employees • San Francisco
    DynamoFL is the most private solution for enterprise AI. Achieve best-in-class and compliant AI at the fraction of the time and cost.
    machine-learning
    privacy
    data-engineering
  • Sarus
    Sarus (W22)Active • 16 employees • Paris, France
    Sarus solves the problem of accessing or sharing personal data for analytics or machine learning. The solution deploys natively in data infrastructures and lets practitioners work on data they cannot see. Every interaction with the sensitive data is protected with the highest privacy standard: differential privacy Sarus makes traditional anonymization methods irrelevant, saving months in compliance and data engineering while preserving all of the value of data.
    analytics
    compliance
    data-engineering
  • Toolchest
    Toolchest (W22)Active • 3 employees • Mountain View
    Toolchest makes it easy for bioinformaticians to run popular computational biology software in the cloud. Drug discovery companies use Toolchest to get analysis results up to 100x faster. We have Python and R libraries that customers use to run popular open-source tools at scale in the cloud. Toolchest is used wherever their analysis currently exists – e.g. a Jupyter notebook on their laptop, an R script on an on-prem cluster, or a Python script in the cloud.
    developer-tools
    drug-discovery
    data-engineering
  • Preloop
    Preloop (W24)Active • 2 employees
    Preloop is a feature platform that aims to automate the cataloging and productionization of features across diverse ML and data science workflows. We believe that current feature platforms require too much additional work; Preloop aims to make the process more automated and reliable. Our platform will automate the process of creating features that are fresh, and make it easy to build the infrastructure required to serve them. As more people recognize the benefits and practicality of using AI, we believe that the importance of a quick and easy to use feature platform will continue to rise.
    artificial-intelligence
    data-science
    data-engineering
    enterprise-software
  • OmniAI
    OmniAI (W24)Active • 2 employees • New York
    OmniAI provides a foundational data infrastructure layer for AI-driven applications. Search and derive instant benefits from unstructured data across your entire data architecture. • No-code connectors to ingest data from any source into a central warehouse (Postgres, MongoDB, Google Drive) • Transform unstructured data into organized, structured formats • Merge semantic search capabilities with conventional search and ranking methods to enhance RAG applications
    artificial-intelligence
    big-data
    data-engineering
  • PeerDB
    PeerDB (S23)Active • 2 employees
    At PeerDB, we are building a fast, simple and the most cost effective way to stream data from Postgres to Data Warehouses, Queues and Storage engines. If you are running Postgres at the heart of your data-stack and move data at scale from Postgres to any of the above targets, PeerDB can provide value. We support different modes of streaming - log based (CDC), cursor based (timestamp or integer) and XMIN based. Performance wise, we are 10x faster than existing tools. Features wise, we support native Postgres features such as comprehensive set of data-types incl. jsonb/arrays/postgis, efficiently streaming toast columns, schema changes and so on.
    developer-tools
    open-source
    data-engineering
    enterprise-software
    databases
  • Whaly
    Whaly (S21)Active • 3 employees • Paris, France
    Whaly helps data teams save time on maintenance and analysis building while making business users more autonomous on the analysis they want to improve their decision making. We do this by providing a self service data platform where both data and business teams can work together. We understood that most data teams were ending up being a bottleneck for the rest of the company and needed to give more autonomy to business teams to back their decisions with data. Emilien, Florian and Pierre were the minds behind the Data advertising platforms of the major media and e-commerce companies in France in their earlier position as Product Manager and head of Customer Success, giving them an edge on how to execute successfully a data project.
    data-engineering
  • CustomerOS
    CustomerOS (S22)Active • 10 employees • London, United Kingdom
    The Top 10% of SaaS companies generate 87% of all market returns. CustomerOS gives you the data and tooling to compete with the top 10%. Specifically, we solve three major problems in B2B SaaS today: 1. CustomerOS is a system of record for all your customer data. We support 100+ integrations with any app or database that touches customer data. And there's no engineering required. 2. CustomerOS provides tooling for your in-life customer motion. We predict renewals (and churn), provide risk-weighted ARR forecasts, and manage all your Customer Success workflows, from onboarding to expansion to advocacy. 3. CustomerOS lead scores your pipeline against your ICP. We build data-driven profiles of your best customers and provide a real-time ICP-fit indicator on your sales and marketing pipeline. This ensures you're spending your CAC acquiring customers who are primed to renew year after year and expand as they grow.
    b2b
    customer-success
    open-source
    enterprise
    data-engineering
  • Patterns
    Patterns (S21)Active • 7 employees • San Francisco
    The fastest way to generate data-driven analyses. Let everyone on your team analyze data on their own enabled by AI Analysts who understand your business and know how to query your data.
    analytics
    data-science
    data-engineering
    data-visualization
  • Lariat Data
    Lariat Data (S21)Active • 2 employees • New York
    Lariat is a Continuous Data Quality monitoring platform to discover data bugs before your consumers do. Ensure data products don’t break even as business logic, input data and infrastructure change. Use Lariat to define and then automatically extract, store and visualize data quality metrics on raw event-level data through to delivered data products.
    machine-learning
    big-data
    data-engineering
  • Waydev
    Waydev (W21)Active • 15 employees • San Francisco
    Leverage insights from your engineering stack to accelerate velocity, align engineering work to business priorities, and increase visibility into your team’s DORA Metrics and SPACE Framework Metrics
    analytics
    enterprise
    data-engineering
  • DAGWorks Inc.
    DAGWorks Inc. (W23)Active • 2 employees • San Francisco
    At DAGWorks Inc. our goal is to change how data + ML + LLM teams are staffed and operate. We’re building an open core SaaS platform to streamline development and operation of data, ML, & LLM pipelines in a collaborative, self-service manner, utilizing a company's existing MLOps and data infrastructure. We believe self-service for Data Practitioners is the future because it enables domain modeling experts the velocity to iterate on pipelines & models without hand-off, which is key for businesses using ML/AI to differentiate themselves. Unless you’re a big tech company or someone like Stitch Fix that can afford a platform team, staffing teams with high ratios of engineers, or finding unicorn data scientists that can build pipelines is your only option; it not only slows time to value, it makes operating ML/AI expensive. We’re here to change that. Think simple python that enables a low software engineering bar to describe what should happen, and then with some extra metadata, generates the workflow code, and that also consolidates several MLOps tools into a single platform, all in a self-service manner. It’s functional and usable by junior and senior folks alike.
    developer-tools
    machine-learning
    b2b
    open-source
    data-engineering
  • HomeRoom
    HomeRoom (W22)Active • 25 employees • San Jose, CA
    Homeroom helps investors provide affordable housing while making a 22% ROI. We do this by sourcing properties, arranging capital, managing construction, vetting tenants and collecting rent by the room. To date, Homeroom has brought on 85 property investors, growing 6X annually, are bringing in 420K in annualized net-revenue How it works: We help investors buy homes in cities that are attractive to young people, but lack affordable housing options. We then renovate and after about 20 days, the home is ready and we find qualified renters by the room. We launched in 2018 in Kansas City with 1 home. We now have 105 homes in 31 cities. In 2021, we grew rental GMV to $1.8M (300% YoY growth). Our average rent across every property is $458, which is about 50% lower than market comps, and our investors see returns up to 50% higher. We are HomeRoom. Johnny is the financial analyst/domain expert. Thomas is a cereal entrepreneur with a PHD in ML, and Mike hacked growth for Airbnb and Facebook.
    machine-learning
    real-estate
    proptech
    nlp
    data-engineering
  • autotab
    autotab (S23)Active • 1 employees • New York
    Autotab is a Chrome extension that writes Selenium code to mirror your actions as you navigate the browser. You can copy that code into your own project or use our starter GitHub repo to get your automation up and running in <5 minutes: https://github.com/Planetary-Computers/autotab-starter. Formerly known as ZTool.
    api
    data-engineering
    automation
    ai
  • Clear
    Clear (W21)Active • 3 employees • London, United Kingdom
    Clear is a free mobile app that helps you track and share your skincare routine. We are fuelling innovation and empowering consumers in the skincare industry via data, technology and community. We were also the 2022 L'Oréal Beauty Tech for Good winners, and were featured under "Best New Apps and Updates" on the iOS App Store in 2023. The skincare industry is worth $200B and social commerce is going to drive the future growth of every brand in the industry.
    fintech
    marketplace
    consumer
    digital-health
    data-engineering
  • Cargo
    Cargo (S23)Active • 3 employees • Paris, France
    Cargo is the first revenue architecture built for modern teams. We help revenue teams to access their company data and automate their sales operations. We provide a headless interface to enable them to easily segment, score and route leads to turn pipeline into revenue.
    sales
    sales-enablement
    data-engineering
    infrastructure
    operations
  • Honeydew
    Honeydew (W23)Active • 6 employees • Tel Aviv-Yafo, Israel
    The way people use data is constantly changing. Data teams must support every new context without breaking the shared truth. Honeydew’s semantic layer does it automatically. We validate each change and update every data flow. Using Honeydew, data teams can support 10x more data users - without more engineers or compromising integrity.
    saas
    b2b
    analytics
    data-engineering
  • Evidence
    Evidence (S21)Active • 6 employees • Toronto, Canada
    Evidence is an open source, code-based alternative to drag-and-drop BI tools. Build polished data products with just SQL and markdown.
    developer-tools
    b2b
    data-engineering
  • Cedalio
    Cedalio (S23)Active • 6 employees • San Francisco
    With Cedalio developers can easily store data with the same scalability and developer experience of the traditional cloud, but with built in transparency, security and verifiability. Everything that happens on the database leaves an encrypted historical record of transactions on the blockchain that can not be tampered with.
    developer-tools
    climate
    supply-chain
    data-engineering
  • Tarsal
    Tarsal (S21)Active • 3 employees • San Francisco
    Tarsal is the first data pipeline built for security teams. It's Fivetran, but for security data. Tarsal provides: - one-click ingestion and normalization for all security logs (e.g multi-cloud infra, Okta/Duo, Slack, CrowdStrike, etc.) - normalization across sources for easy correlations - a vendor-agnostic pipeline so you can use the best log destination for the job (supported destinations include Snowflake, S3, Databricks, Splunk, DataDog, etc.)
    b2b
    cybersecurity
    big-data
    data-engineering
  • TableFlow
    TableFlow (W23)Active • 2 employees • San Francisco
    TableFlow is an open source data import platform for companies to collect and transform customer data. Instead of building an in-house file upload and processing service, businesses can embed or link to TableFlow's customizable importer to manage their data onboarding needs.
    artificial-intelligence
    developer-tools
    saas
    open-source
    data-engineering
  • Outerbase
    Outerbase (W23)Active • 4 employees • Pittsburgh, PA
    Outerbase is the interface for your database. Companies use Outerbase to view, edit, and modify their data and even generate beautiful visual dashboards without having to write a single line of SQL.
    developer-tools
    generative-ai
    analytics
    data-engineering
    ai
  • Logarithm Labs
    Logarithm Labs (W20)Active • 2 employees • Foster City, CA
    Easy button to use data for your daily operations. Power your business workflows with quality data. Logarithm Labs helps you turn manual data wrangling and ad-hoc scripts into repeatable pipelines for your operational workflows. Power your workflows with quality data. Our product and team of experts do the heavy lifting so that can focus on the business logic that drives your organization. To learn more, contact us at hello@logarithmlabs.com.
    developer-tools
    data-engineering
  • Operator Labs
    Operator Labs (W20)Active • 6 employees • New York
    Easily generate reports from on-chain data
    generative-ai
    crypto-web3
    data-engineering
  • communion
    communion (S19)Active • 8 employees • New York
    creative tools + powerful analytics
    artificial-intelligence
    marketing
    advertising
    data-engineering
    ai-assistant
  • SwiftSku
    SwiftSku (W21)Active • 35 employees • San Francisco
    SwiftSku connects the $650B convenience store industry with management and analytics. SwiftSku’s app connects to point of sales at convenience stores in real time, enabling owners to remotely manage and monitor their stores. We take the guesswork out of running a convenience store with predictive analytics, dashboards, and reports. SwiftSku's CEO, Mit Patel, grew up managing the inventory, pricebook, and reporting of his family’s convenience stores, and, when vendors would come by, he’d bridge the language barrier as a translator. More than 85% of independent convenience stores are owned by Indian families like Mit’s. Solving convenience store owners' pains of today leads to SwiftSku's greater vision of optimizing the supply chain, facilitating a retailer agnostic consumer to brand relationship, and providing real time insights to brands and retailers.
    saas
    b2b
    analytics
    retail
    data-engineering
  • LaunchFlow
    LaunchFlow (W23)Active • 2 employees
    LaunchFlow is the fastest way to build and deploy Python applications on the cloud. Our platform provides developers with the framework, tools, and infrastructure needed to build scalable, more reliable Python applications.
    developer-tools
    machine-learning
    b2b
    data-engineering
    cloud-computing
  • Prequel
    Prequel (W21)Active • 5 employees • New York
    Prequel makes it easy for companies to share data with their customers. It helps you export data directly to your customer's Snowflake, Redshift, BigQuery, Databricks, or other data warehouse on an ongoing basis.
    saas
    analytics
    data-engineering
  • Avenue
    Avenue (W21)Active • 8 employees • New York
    Avenue is a simple way for business teams to set up alerts from their database or data warehouse. Think Datadog / PagerDuty for operations teams. Operations teams create set-and-forget alerts on all their data, so they can be more proactive with their time (and monitor on more nuanced triggers than just what fits on their dashboard page). Avenue can improve response times to critical problems from several days to real-time by alerting directly on the data sources that customers already use.
    developer-tools
    saas
    data-engineering
  • Secoda
    Secoda (S21)Active • 22 employees • Toronto, Canada
    Secoda is a universal data discovery and documentation tool that makes finding metadata, queries, charts and documentation as easy as a google search. Today, data teams are collecting tons of data, but most employees don't know what data exists, how to use it, and what data to trust. This confusion happens because different components of company data get collected in fragmented tools Secoda helps teams find, understand data in one easy to use platform that's accessible to any employee.
    developer-tools
    saas
    b2b
    analytics
    data-engineering
  • Chaos Genius
    Chaos Genius (W20)Active • 10 employees • San Francisco
    Chaos Genius is a DataOps Observability platform for Snowflake. Enable Snowflake Observability to reduce Snowflake costs and optimize query performance.
    cloud-workload-protection
    machine-learning
    analytics
    open-source
    data-engineering
  • Etleap
    Etleap (W13)Active • 11 employees • San Francisco
    Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike other enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own.
    data-engineering
  • Polytomic
    Polytomic (W20)Active • 7 employees • San Francisco
    Polytomic is a no-code web app to sync data between your internal databases, business systems (e.g. Stripe, Salesforce, etc), data warehouses, spreadsheets, and even HTTP APIs.
    saas
    b2b
    data-engineering
  • Dataland
    Dataland (S20)Active • 2 employees • New York
    Dataland lets internal teams search tables in Snowflake, BigQuery, and Postgres at extreme speed. Full-text search on billion-row tables finish within <1 second, if not <0.5s. It's 500x faster and cheaper than the status quo (e.g. Retool on Snowflake). Dataland comes with a beautifully designed UI. Any business user can get answers they need from massive datasets. Data engineers no longer have to build one-off, slow tools just for database lookups.
    saas
    b2b
    data-engineering
  • Imbue (formerly Generally Intelligent)
    Imbue (formerly Generally Intelligent) (S17)Active • 15 employees • San Francisco
    Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents. We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.
    machine-learning
    data-engineering
    ai
  • Datafold
    Datafold (S20)Active • 24 employees • New York
    Datafold exists to make working with data more enjoyable and productive. We are all about empowering data and analytics engineers. We find the most tedious, error-prone, and repetitive tasks and create tools to automate them. We make the world better by giving superpowers to data professionals who solve hard problems in various domains with data.
    saas
    analytics
    data-engineering
  • Jitsu
    Jitsu (S20)Active • 4 employees • San Francisco
    Jitsu is the fastest, most durable way to collect event data from every source - web, app, email, chatbot, CRM - into your data warehouse. 100% open-source. Purpose built, secure and ready in minutes.
    saas
    b2b
    open-source
    data-engineering
  • Mozart Data
    Mozart Data (S20)Active • 24 employees • San Francisco
    Mozart Data provides an out-of-the-box modern data stack that empowers anyone to easily consolidate, organize, and prepare their data for analysis. Spin up a data stack that’s built on a best-in-class data warehouse and ETL tool in hours, without any engineering. You can finally spend more time on generating insights and less time wrangling your data.
    saas
    b2b
    data-engineering
  • Acho
    Acho (W20)Active • 15 employees • Boston
    Acho is a Data App Development Platform, powered by AI. This platform enables teams to transform business data into mission-critical applications used for automation, business intelligence, data science, internal tools, and customer-facing products. Today, Acho plays a pivotal role in elevating operational efficiency, automating workflows, and turning data into products for over 100 businesses. Among our valued customers are supply chain divisions of major global corporations, IT departments of Online Travel Agencies, Finance & Accounting units of prestigious banking institutions, and other organizations that play a key role in our daily life.
    saas
    data-engineering
    enterprise-software
    cloud-computing
    infrastructure
  • TetraScience
    TetraScience (S15)Active • 100 employees • Boston
    TetraScience provides the world’s first and only R&D Data Cloud, with a mission to transform life sciences R&D, accelerate discovery, and improve human life. Scientists at global pharma and biotech organizations rely on our innovative Tetra Data Platform for easy access to centralized, harmonized, and actionable scientific data to accelerate their digital lab transformation. With best-in-class SaaS performance, a team of industry innovators, and excellent product/market fit, Tetra is positioned to become an iconic life sciences software company.
    saas
    data-engineering
  • Streamdal
    Streamdal (S20)Active • 9 employees • Portland, OR
    SaaS data platform for observing, repairing and replaying data in streaming systems.
    developer-tools
    data-engineering
    devops
  • Converge
    Converge (S23)Active • 3 employees • San Francisco
    Tracking customer events (e.g. Add To Cart, Purchase, etc.) correctly is important, yet unattainable for most online stores due to the limitations of tracking in the browser and lack of in-house developers. Converge auto-tracks all important events – across the browser, store backend and subscription platforms. Once tracking is set up, Converge allows online stores to forward these events with the flip of a switch to their advertising platforms and analytics tools leading to improved ad performance and better insights. Our larger vision is to go beyond data infrastructure; and leverage our single customer data layer to build out a perfectly integrated set of applications that helps brands reduce their customer acquisition cost.
    saas
    analytics
    e-commerce
    data-engineering
    infrastructure
  • Lume
    Lume (W23)Active • 3 employees • New York
    Lume is an AI tool to generate and maintain custom data integrations. Lume uses AI to automatically transform data between any start and end schema, and pipes the data directly to your desired destination.
    artificial-intelligence
    generative-ai
    saas
    b2b
    data-engineering
  • Versori
    Versori (W23)Active • 8 employees • Manchester, United Kingdom
    Versori Switchboard which allows any person, regardless of their technical level to develop and build their integrations, migrations, workflows and transformations all in one clean platform. Enabling medium to large enterprises to cut extensive costs and timeframes when it comes to large data processing events, Versori is to become synonymous with critical data infrastructure.
    saas
    b2b
    api
    no-code
    data-engineering
  • authzed
    authzed (W21)Active • 12 employees • New York
    We build the tools companies need to provide performant and scalable authorization for their applications. We’re founded by 3 successful entrepreneurs with expertise in enterprise software, most recently as leaders at Red Hat. Jake and Joey met on the APIs team at Google in 2010. They went on to found Quay, where Jimmy joined as their first hire. Over the past decade, they’ve changed the landscape for building and deploying software.
    developer-tools
    saas
    security
    open-source
    data-engineering
  • Egress
    Egress (S23)Active • 2 employees • San Francisco
    Egress is the AI layer for company data. It allows anyone to transform and take action on data in their warehouse or database using natural language. For example, Egress has helped several companies identify high-propensity users from product data and convert them using personalized outreach campaigns.
    artificial-intelligence
    data-engineering
  • OneSchema
    OneSchema (S21)Active • 10 employees • San Francisco
    Product and engineering teams use OneSchema to save months of development time to build a CSV importer. OneSchema improves customer activation / import completion rates by automatically correcting customer data.
    developer-tools
    saas
    b2b
    data-engineering
  • Neum AI
    Neum AI (S23)Active • 2 employees • Seattle, WA
    Neum AI does for vector embeddings what Fivetran and Airbyte do for traditional data. But Neum AI goes beyond data loading and cleansing and is best-of-breed in optimizing the creation and real-time synchronization of vector embeddings at a massive scale. Our vision includes handling vector replication across data stores and embedding management for auditing and compliance. To get in touch with us send an email to founders@tryneum.com or book time in our Calendly: https://calendly.com/neum-ai/neum-ai-demo
    developer-tools
    generative-ai
    b2b
    data-engineering
    infrastructure
  • Serra
    Serra (S23)Active • 2 employees • San Francisco
    Serra is Tableau for data infrastructure. Serra enables smaller, less-technical teams to build cloud data infrastructure—batch and real-time data pipelines, rapid SQL analytics, and scalable data science and ML—through a user-friendly dashboard.
    developer-tools
    big-data
    data-engineering
    ai
  • DataShare
    DataShare (S23)Active • 1 employees • Austin, TX
    DataShare is a data-as-a-service platform that lets you embed charts, dashboards and exports directly into your product. For example, if you run an accounting startup, DataShare would enable you to embed a full profit and loss dashboard, with downloadable statements. DataShare is backed by an enterprise-grade data warehouse, and can be implemented in fewer than 20 lines of code.
    analytics
    data-engineering
    databases
  • Metaplane
    Metaplane (W20)Active • 12 employees • Boston
    Metaplane ensures everyone trusts the data that powers your business. Data teams at Bose, Ramp, and Klaviyo use our data observability platform to prevent and detect data issues — before the CEO pings them about weird revenue numbers. We do this with ML-based anomaly detection, end-to-end column-level lineage, and tools to help prevent incidents before they occur. You can monitor your entire data stack within 30 minutes. The company is backed by Khosla Ventures, Y Combinator, and the founders of Okta, HubSpot, and Vercel.
    developer-tools
    saas
    data-engineering
  • Stacksync
    Stacksync (W24)Active • 4 employees • San Francisco
    Stacksync powers real-time and bidirectional data synchronization between CRMs (e.g. Salesforce, Hubspot or SAP) and databases (e.g. Postgres, Google BigQuery,...). Edits made in your CRM will instantly update in your Database, and vice-versa. To set up a sync, users simply have to connect the two chosen apps in one click and select the tables they want to sync, no-code! Stacksync reduces implementation delays from months to minutes for CRM integration projects and removes all the complexity behind CRM new feature development. We show a 90% improvement on delivery time and budget.
    b2b
    api
    crm
    data-engineering
    databases
  • Narrator
    Narrator (S19)Active • 8 employees • New York
    Narrator is an end-to-end platform built on top of the data standard, the Activity Schema and starting at $500/mo. Data analyst are able to build their definitions of their user journey, and use that journey to answer any question that comes up. From there, data can be visualized in a dashboard, used to build a story like analysis, exported and more. The biggest values of Narrator is Speed and Cost reduction. Small teams are able to move fast and answer questions in minutes allowing them to preform the work of very large data teams. All while Narrator is optimized to minimize compute cost of the warehouse.
    analytics
    big-data
    data-engineering
  • Satsuma
    Satsuma (S21)Acquired • 5 employees • San Francisco
    Satsuma is a developer tool for building applications on top of real-time blockchain data. Our product lets developers take decoded data from multiple chains, customize it for their use cases, and access it through API endpoints. Blockchains serve as distributed databases for these products, holding their most important data. However, it’s difficult to access and query that data. We believe this friction is an enormous blocker for web3 developers and that better tooling will enable mass adoption for web3. We’re a founding team of engineers, having built data infrastructure and product as early employees at Airtable, Heap, and Y Combinator.
    developer-tools
    saas
    crypto-web3
    data-engineering
  • Stackshine
    Stackshine (W22)Acquired • 7 employees • Portland, OR
    Stackshine is creating mission control for enterprise IT teams. We discover all the software being used across their organization and then automate workflows related to onboarding/offboarding, cost savings, and security.
    robotic-process-automation
    productivity
    analytics
    enterprise
    data-engineering
  • Data Mechanics
    Data Mechanics (S19)Acquired • 25 employees • Paris, France
    Data Mechanics was acquired by NetApp in 2021 and integrated in the Spot.io product portfolio. Our managed Spark-on-Kubernetes platform is live and running under the name Ocean for Apache Spark: https://spot.io/products/ocean-apache-spark/
    saas
    b2b
    open-source
    data-engineering
  • Yhat (W15)Acquired • 17 employees • NY
    Yhat (YC W15, pronounced y-hat) was an end-to-end data science platform. Acquired by Alteryx (NYSE:AYX)
    artificial-intelligence
    machine-learning
    enterprise
    data-engineering