Data Products

Uncover the categories of products used in a modern data architecture and some products for each category.

 

 

 

 

 

 

 

Event Streaming

Product Website Open Source Other References
Apache Kafka https://kafka.apache.org/ https://github.com/apache/kafka X/Twitter
Confluent https://www.confluent.io/ https://github.com/confluentinc Training, X/Twitter
Conduktor https://www.conduktor.io/ https://github.com/conduktor/ X/Twitter
AWS Kinesis https://aws.amazon.com/kinesis/ N/A
Azure Event Hub https://azure.microsoft.com/en-us/products/event-hubs/ N/A Connections overview
GCP Datastream https://cloud.google.com/datastream N/A

Pub/Sub Messaging

Product Website Open Source Other References
Apache Pulsar https://pulsar.apache.org/ https://github.com/apache/pulsar
Google Pub/Sub https://cloud.google.com/pubsub N/A
Amazon SQS https://aws.amazon.com/sqs/ N/A
StreamNative (Pulsar) https://streamnative.io/ N/A Hub
Pandio (Pulsar) https://pandio.com/apache-pulsar-as-a-service/ N/A Free Trial

Event Data Format / Binary Encoded Format

Product Website Open Source Other References
Avro https://avro.apache.org/ https://github.com/apache/avro X/Twitter
JSON Schema https://json-schema.org/ https://github.com/json-schema-org Slack, Avro to JSON Schema
JSON https://www.json.org/ N/A Linter
ProtoBuf https://developers.google.com/protocol-buffers https://github.com/protocolbuffers/protobuf/
Thrift https://thrift.apache.org/ https://github.com/apache/thrift

Runtime Registry

Product Website Open Source Other References
APIcurio https://www.apicur.io/registry/ https://github.com/apicurio/apicurio-registry

Change Data Capture (CDC)

Product Website Open Source Other References
Debezium https://debezium.io/ https://github.com/debezium X/Twitter

Event Stream Processing

Product Website Open Source Other References
Apache Flink https://flink.apache.org/ https://github.com/apache/flink/
Apache Beam https://beam.apache.org/ https://github.com/apache/beam/
Hazelcast https://hazelcast.com/
Apache Samza https://samza.apache.org/ https://github.com/apache/samza/
Apache Spark https://spark.apache.org/ https://github.com/apache/spark/ X/Twitter
Apache Storm https://storm.apache.org/ https://github.com/apache/storm/ X/Twitter
RutterStack https://www.rudderstack.com/ X/Twitter
Striim https://www.striim.com/ Free Trial, X/Twitter

Real-time Data Processing

Product Website Open Source Other References
Decodable https://www.decodable.co/ X/Twitter

Data Pipelines - Batch Processing

Product Website Open Source Other References
Dagster https://dagster.io/ https://github.com/dagster-io/dagster Slack
Prefect https://www.prefect.io/ https://github.com/prefecthq/prefect X/Twitter
Orchest https://www.orchest.io/ https://github.com/orchest/orchest Slack
Upsolver https://www.upsolver.com/ N/A Free Trial
Fivetran https://www.fivetran.com/ N/A Demo - 14 day trial
Apache Airflow https://airflow.apache.org/ https://github.com/apache/airflow/ X/Twitter
Astronomer (Airflow) https://www.astronomer.io/ https://github.com/astronomer X/Twitter
Gathr https://www.gathr.one/ Free Plan, X/Twitter

Data Transformation

Product Website Open Source Other References
dbt https://www.getdbt.com/ https://github.com/dbt-labs/dbt-core Docs
Snowpark https://www.snowflake.com/snowpark/

Extract - Transform - Load (ETL)

Product Website Open Source Other References
Matillion https://www.matillion.com/ N/A X/Twitter, Demo
Airbyte https://airbyte.com/ https://github.com/airbytehq/airbyte Slack
Hevo https://hevodata.com/pipeline/ X/Twitter

Data Catalog

Product Website Open Source Other References
data.world https://data.world/ N/A Free Community Version
Alation https://www.alation.com/ N/A
Collibra https://www.collibra.com/ N/A

Data Lineage

Open Lineage Foundation https://openlineage.io/ https://github.com/OpenLineage/OpenLineage Slack

Data Fabric

Product Website Open Source Other References
Talend https://www.talend.com/ N/A

Data Storage Format

Product Website Open Source Other References
Parquet https://parquet.apache.org/ https://github.com/apache/parquet-mr/ Slack, X/Twitter
Apache Arrow https://arrow.apache.org/ https://github.com/apache/arrow X/Twitter

SQL Query Engine

Product Website Open Source Other References
Apache Iceberg https://iceberg.apache.org/ https://github.com/apache/iceberg Slack
Trino https://trino.io/ https://github.com/trinodb/trino Slack, X/Twitter
Apache Drill https://drill.apache.org/ https://github.com/apache/drill Slack, X/Twitter
Starburst https://www.starburst.io/ X/Twitter
Amazon Athena https://aws.amazon.com/athena N/A
Pandio (Trino) https://pandio.com/managed-trino-as-a-service/ N/A Free Trial
Plywood PlyQL https://plywood.imply.io/plyql https://github.com/implydata/plyql Deprecated due to Druid SQL
Amazon Redshift Spectrum https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html N/A
Dolt https://www.dolthub.com/ https://github.com/dolthub/dolt X/Twitter
Flatbase https://flatbase.io/ N/A X/Twitter

Data Analytics / Business Intelligence

Product Website Open Source Other References
Power BI https://powerbi.microsoft.com/ N/A
Tableau https://www.tableau.com/ N/A X/Twitter, Public X/Twitter, Free Trial
Looker https://www.looker.com/
Siense https://www.sisense.com/ Free Trial, SsenseX/Twitter
Startree https://www.startree.ai/ N/A X/Twitter, Slack
Informatica https://www.informatica.com/solutions/power-cloud-analytics.html N/A X/Twitter
Qlik Sense https://www.qlik.com/us/products/qlik-sense N/A X/Twitter
Cascade https://www.cascade.io/ X/Twitter
Altreryx Designer https://www.alteryx.com/ X/Twitter, Free Trial
Cloudbutton https://cloudbutton.eu/ https://github.com/cloudbutton X/Twitter
Whaly https://whaly.io/ X/Twitter
Zing https://www.getzingdata.com/ X/Twitter
Zoho Analytics https://www.zoho.com/analytics/ X/Twitter
Workstream https://www.workstream.io/
Azure Data Explorer https://azure.microsoft.com/en-us/products/data-explorer/ N/A Documentation, API, Data Formats

Data Visualization

Product Website Open Source Other References
Google Data Studio https://datastudio.google.com/
Google Charts https://developers.google.com/chart
Observable https://observablehq.com/ https://github.com/observablehq X/Twitter
Plotly https://plotly.com/ https://github.com/plotly X/Twitter
Highcharts https://www.highcharts.com/ https://github.com/highcharts X/Twitter
Chartbeat https://chartbeat.com/ X/Twitter
Redash https://redash.io//td> https://github.com/getredash X/Twitter
Databox https://databox.com/ X/Twitter
Datawrapper https://www.datawrapper.de/ https://github.com/datawrapper X/Twitter
Infogram https://infogram.com/ X/Twitter
Domo https://www.domo.com/business-intelligence/reporting-dashboards X/Twitter
Preset https://preset.io/ X/Twitter
Yellowfin https://www.yellowfinbi.com/suite/data-visualization X/Twitter

Data Management

Product Website Open Source Other References
DataOps (Snowflake) https://www.dataops.live/ X/Twitter, eBook, Free Trial
Apache Calcite https://calcite.apache.org/ X/Twitter

Data Warehouse

Product Website Open Source Other References
Google Big Query https://cloud.google.com/bigquery
Amazon Redshift https://aws.amazon.com/pm/redshift/
Snowflake https://www.snowflake.com/
Vertica https://www.vertica.com/ X/Twitter

OLAP Data Store

Product Website Open Source Other References
Apache Druid https://druid.apache.org/ https://github.com/apache/druid/ https://twitter.com/druidio
Apache Pinot https://pinot.apache.org/ https://github.com/apache/pinot X/Twitter
Clickhouse https://clickhouse.com/ https://github.com/ClickHouse/ClickHouse X/Twitter
Rockset https://rockset.com/ X/Twitter
SingleStore https://www.singlestore.com/ https://github.com/memsql/

Storage Service

Product Website Open Source Other References
Apache Bookkeeper https://bookkeeper.apache.org/ https://github.com/apache/bookkeeper/ X/Twitter
Amazon S3 https://aws.amazon.com/s3/
Google Cloud Storage https://cloud.google.com/storage
Azure Blob Storage https://azure.microsoft.com/en-us/products/storage/blobs/
Ceph https://ceph.io/en/ https://github.com/ceph/ceph Community

NoSQL Database

Apache Cassandra https://cassandra.apache.org/ https://github.com/apache/cassandra X/Twitter
DataStax https://www.datastax.com/ https://github.com/datastax/ X/Twitter
MongoDB https://www.mongodb.com/ https://github.com/mongodb/ X/Twitter
Amazon DynamoDB https://aws.amazon.com/dynamodb/ N/A
Asure Cosmos DB https://azure.microsoft.com/products/cosmos-db/ N/A
ScyllaDB https://www.scylladb.com/ https://github.com/scylladb/scylladb X/Twitter, Slack

Time-series Database

Timescale https://www.timescale.com/ https://github.com/timescale/timescaledb X/Twitter, Slack
TempoDB https://tempo-db.com/ N/A X/Twitter
InfluxData https://www.influxdata.com/ https://github.com/influxdata X/Twitter, Community
M3 https://m3db.io/ https://github.com/m3db/m3 Slack

Real-time Database

Materialize https://materialize.com/ https://github.com/MaterializeInc/materialize X/Twitter
Imply Polaris https://imply.io/imply-polaris/ https://github.com/implydata Community, Free Trail
Aerospike https://aerospike.com/ X/Twitter

In-memory Database

Infinispan https://infinispan.org/ N/A X/Twitter

Relational OLTP Database

MySQL https://www.mysql.com/ https://github.com/mysql X/Twitter
PostgreSQL https://www.postgresql.org/ https://github.com/postgresql X/Twitter, Planet Blog
Percona https://www.percona.com/ https://github.com/percona X/Twitter
MariaDB https://www.mariadb.com/ https://github.com/percona X/Twitter
Oracle https://www.oracle.com/ X/Twitter

Data Lake / Data Lakehouse

Databricks https://www.databricks.com/ N/A X/Twitter
Delta Lake https://delta.io N/A X/Twitter, Slack
Dremio https://www.dremio.com/ N/A Includes Free Standard Edition

Data Stack

Keboola https://www.keboola.com/ N/A X/Twitter

Reverse ETL

Census https://www.getcensus.com/ N/A Includes Free Edition

Data Cleansing

Astera https://discover.astera.com/ N/A X/Twitter

Data Insights

Accern (NLP) https://accern.com/ N/A
Snowplow https://snowplow.io/ N/A X/Twitter

Data Generation

Mockaroo https://www.mockaroo.com/ N/A X/Twitter

Notebook Extensions

Hex https://hex.tech/ N/A X/Twitter
Einblick https://www.einblick.ai/ N/A X/Twitter