Glossary

The Synnada glossary explains key terms and concepts in data science, machine learning, AI, and analytics. Learn about popular ML algorithms, data engineering, statistics and more from our comprehensive tech glossary.

Algorithms/Data Structures

Skip List

A skip list is a probabilistic data structure that provides fast search and insertion over an ordered sequence using hierarchy of linked lists to skip over elements.

Read more ->

B-tree

A B-tree is a tree data structure optimized for fast indexed key lookups and writes on disk storage while keeping the tree balanced.

Read more ->

Distributed Hash Table

A distributed hash table (DHT) is a decentralized distributed system that partitions a key space across nodes and uses hash functions to assign ownership and locate data.

Read more ->

Bloom Filter

A Bloom filter is a probabilistic data structure used to test set membership that is space-efficient compared to storing the full set.

Read more ->

Interval arithmetic

Interval arithmetic is a method of computing with sets of numbers rather than single values, representing uncertainty in calculations and accounting for rounding errors.

Read more ->

Probabilistic data structures

Probabilistic data structures are space and time efficient data structures that use randomized algorithms to provide approximate results to queries with strong guarantees.

Read more ->

CAP Theorem

The CAP theorem states that distributed data systems can only support two of the three guarantees: consistency, availability and partition tolerance.

Read more ->

Hash functions

Hash functions are algorithms that map data of arbitrary size to fixed-size values called hashes in a deterministic, one-way manner for purposes like data integrity and database lookup.

Read more ->

Collision resistance

Collision resistance is the property of cryptographic hash functions to minimize chances of different inputs mapping to the same output hash, making it difficult to intentionally cause collisions.

Read more ->

Count Min Sketch

A Count Min Sketch is a probabilistic data structure used to estimate item frequencies and counts in data streams.

Read more ->

Data pruning

Data pruning refers to database techniques that eliminate irrelevant data during query processing to minimize resource usage and improve performance.

Read more ->

Data Processing

Lambda architecture

Lambda architecture is a big data processing pattern which combines both batch and real-time stream processing to get the benefits of high throughput and low latency querying.

Read more ->

Unified processing

Unified processing refers to data pipeline architectures that handle batch and real-time processing using a single processing engine, avoiding the complexities of hybrid systems.

Read more ->

Batch processing

Batch processing is the execution of a series of programs or jobs on a set of data in batches without user interaction for efficiently processing high volumes of data.

Read more ->

Kappa architecture

Kappa architecture is a big data processing pattern that uses stream processing for both real-time and historical analytics, avoiding the complexity of hybrid stream and batch processing.

Read more ->

Query Execution

Partitioning

Database partitioning refers to splitting large tables into smaller, independent pieces called partitions stored across different filegroups, drives or nodes.

Read more ->

Distributed Execution

Distributed execution refers to techniques to execute database queries efficiently across clustered servers or nodes, dividing work to utilize parallel resources.

Read more ->

Parallel Execution

Parallel execution refers to techniques for speeding up database query processing by leveraging multiple CPUs, servers, or resources concurrently.

Read more ->

Query Execution

Query execution is the process of carrying out the actual steps to retrieve results for a database query as per the generated execution plan.

Read more ->

Data Storage and Sources

Graph Database

A graph database stores data in a graph structure with nodes, edges and properties to represent and query relationships between connected data entities.

Read more ->

Key-value Store

A key-value store is a type of NoSQL database optimized for storing, retrieving and managing associative arrays of key-value pairs.

Read more ->

Data Warehouse

A data warehouse is a centralized data management system designed to enable business reporting, analytics, and data insights.

Read more ->

Message Broker

A message broker is a software system that facilitates communications between distributed applications and services by transferring messages in a reliable and scalable manner.

Read more ->

Time-series Database (”TSDB”)

A time-series database (TSDB) is a database engineered and optimized for handling time-series data, where each data point contains a timestamp.

Read more ->

Relational Database

A relational database is a type of database that stores and provides access to data according to relations between defined entities organized in tables.

Read more ->

Data Processing Engine

A data processing engine is a distributed software system designed for high-performance data transformation, analytics, and machine learning workloads on large volumes of data.

Read more ->

Data Lake

A data lake is a scalable data repository that stores vast amounts of raw data in its native formats until needed.

Read more ->

Document Store

Document store database manages collections of JSON, XML, or other hierarchical document formats, providing querying and indexing on document contents.

Read more ->

Spatial Database

A spatial database is a database optimized to store, query and manipulate geographic information system (GIS) data like location coordinates, topology, and associated attributes.

Read more ->

RDF Store

An RDF store is a graph database optimized for storing and querying RDF triple data to represent facts and relationships.

Read more ->

Data Orchestrator

A data orchestrator is a middleware tool that facilitates the automation of data flows between diverse systems such as data storage systems (e.g. databases), data processing engines (e.g. analytics engines) and APIs (e.g. SaaS platforms for data enrichment).

Read more ->

Vector Database

A vector database is designed to efficiently store and query vector representations of data for applications like search, recommendations, and AI.

Read more ->

Search Engine (Database)

A search engine database is designed to store, index, and query full text content to enable fast text search and retrieval.

Read more ->

Get early access to AI-native data infrastructure