Assaf Pinhasi    Assaf Pinhasi
  • Consulting
  • Open source
  • Blog

Global Fintech company

  • Design Inference infrastructure - online, batch and streaming
  • Implement offline inference solution

Technology stack: AWS, Sagemaker, Python, Kuberentes, Kafka

Israeli unicorn startup - Deep Learning for Computer vision

  • Redesign the entire inference infrastructure

Example projects

Zebra Medicla Vision (Acquired by Nanox)

  • Remote training infrastructure based on Kubernetes
  • Adoption of experiment tracking tools
  • Cross-research team pipeline tool for ease of experimentation
  • Automated large scale runs on data in the wild to detect algorithm anomalies before release
  • CI for testing the inference code and packaging it along with the model as a Docker container

3DFY.ai

Large scale, distributed training environment based on Kuberentes and PyTorch Elastic.

link

Multiple Israeli startups

  • Adoption of standard pipeline tools
  • Adoption of experiment tracking
  • Design and adoption of model reproducibility processes, packaging and versioning
  • Design and implement testing methodologies based on individual product, data and algorithm quality risks

Example projects

Stream/Batch processing platform @PayPal

I led a group in charge of a new Big Data Platform for PayPal’s Risk org. The platform enabled PayPal engineers and scientists to author data pipelines combining Stream and batch processing from A-Z: Infra, SDK, QA and and DevOps Tooling.

Technology stack: Spark Streaming, Apache Beam, Kafka, Aerospike, Elastic Search, Graphite, Grafana, ELK and more…

InfoQ session by team members

Zebra Medical (acquired by Nanox) - Data platform for Deep Learning (CV)

  • Batch pipelines that ingest PB-scale medical imaging corpuses
  • Complex data de-identification protocols
  • Full-stack annotation system for Medical imaging, as part of the pipeline
  • Automated, large scale pre-processing over Kuberentes for image cropping and augmentations
  • Data warehouse that enables searching for images by Metadata, as well as retrieving annotations and doing evaluation

Technology stack: AWS, Kuebernetes, MongoDB, ElasticSearch, Postgress, Redis, node.js, angular, and more…

Global Fintech company

  • Design the entire data infrastructure for ML group - from events to Feature Engineering
  • Implemented feature store solution (online / offline, batch / streaming)

Technology stack: Spark, Kafka, Python, Cassandra, AWS

3DFY.ai - Large scale data preprocessing for Computer vision

  • Serverless, Cloud platform
  • Perform expensive 3D rendering from meshes and store the results as files/objects

Technology stack: AWS, GCP, Docker, AWS Batch, Python

Blog post

Several Israeli computer vision Startups

  • A serverless data pipeline that ingests images and annotions
  • ELT pipeline that transforms this into a searchable data warehouse

Technology stack: AWS, GCP, Athena, Kinesis, Lambda, Snowflake, Redshift, DBT, more…

    © Assaf Pinhasi 2022