Senior Data Engineer @ Shaped

Senior Data Engineer @ Shaped

The fastest path to relevant recommendations and search

Job Details

Williamsburg
6 - 11 years of experience
In office 5 days per week
$130,000 - $200,000

About Shaped

Shaped is the fastest path to relevant recommendation and search systems. We help companies turn their behavioral data into truly relevant product and website experiences.

We're a Series A companies based in Brooklyn, New York and backed by top investors from Madrona, Y-Combinator and executives from Meta, Google, Amazon and Uber!


Job Description

Skills: Kafka, SQL, Spark, Python

We are looking for a data engineer to design, build and optimize Shaped's real-time and batch streaming infrastructure. You will be a founding engineer working to reliably ingest customer data (both with batch and real-time processing) into our our state-of-the-art AI discovery engine. As one of Shaped's early employees you will help shape our product, culture and vision.

  • Skills should include Python, Data Warehouses (such as Clickhouse, Snowflake, or BigQuery)
  • Nice-to-have skills should include DBT, Meltano, Kubernetes, and Apache Flink (or other stream processing frameworks)

We're excited to work with you. Come build the future of AI with us!

Technology

Customers typically use Shaped as follows:

  1. Connect your data stack, e.g. data warehouse, database or analytics applications
  2. Define your model. This includes your optimization objective (e.g. clicks vs purchases vs shares), item and user catalogs, feature types and model types.
  3. Consume your results from our real-time, scalable ranking endpoints
  4. Evaluate uplift and model results on our dashboard.

To power all of this, under the hood, we've built a multi-tenanted, real-time machine learning architecture which automatically sets-up and ingests data both in real-time and batch, transforms data and stores it into our proprietary feature/vector store. Ranking models are continuously optimized and fine-tuned based on real-time feedback ensuring customers are seeing the most relevant and up-to-date results possible.

From a machine-learning perspective we use state-of-the-art large scale neural encoding models to understand multi-modal data types such as image, text, audio and tabular data. We provide an exhaustive library of retrieval, ranking and ordering algorithms which are selected based on the specified model definition.

We use both AWS and GCP for cloud. Kubernetes for serverless infrastructure. Python, Javascript and Rust for languages.

Benefits

Health InsurancePaid Time OffVision InsuranceDental Insurance

Tags

KafkaStreamPythonSparkSQL