-
Informatica
- Redwood City, CA
- in/arun-mohan-subramonian-4823b674
Stars
A clone of .GEARS' Flappy Bird in just over 1000 lines of C
Trench — Open-Source Analytics Infrastructure. A single production-ready Docker image built on ClickHouse, Kafka, and Node.js for tracking events, page views. Easily build product analytics dashboa…
A multithreaded execution framework inspired by Apache Cassandra’s concurrent execution model
QuestDB is a high performance, open-source, time-series database
🚀✨ Help beginners to contribute to open source projects
Contains spark dataframe solutions of leetcode questions
Code for generating tables of data and tabular files (CSV, JSON, Parquet etc) for testing
Collection of experiments to carve out the differences between two types of relational query processing engines: Vectorizing (interpretation based) engines and compiling engines.
Code repo for "An Empirical Evaluation of Columnar Storage Formats" VLDB Vol 17
A basic introduction to coding in modern C++.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
An elastic load balancer implemented in Java
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Uniffle is a high performance, general purpose Remote Shuffle Service.
DirectMemory is a cache implementation featuring off-heap memory storage (a-la BigMemory) to enable caching of large (or large numbers of) objects without degrading jvm performance. ATTENTION PLEAS…
A hashmap implementation for Java that stores map entries off-heap
DuckDB is an analytical in-process SQL database management system
A toy Java implementation of a query engine over in-memory, columnar, schema-ful data
Design and implementation of a Relational Database System in Java. Supports tables, column types, constraints, keys, SQL commands, pretty print, and import/export csv functionality.
Next generation distributed, event-driven, parallel config management!
Curated list of project-based tutorials
Execute Dependent/Independent tasks in a reliable way
Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
Comprehensive guide, algorithms and tools on distributed systems
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.