Demetrio Rodriguez, Developer in Vienna, Austria
Demetrio is available for hire
Hire Demetrio

Demetrio Rodriguez

Verified Expert  in Engineering

Data Scientist and Developer

Location
Vienna, Austria
Toptal Member Since
May 25, 2021

Demetrio是一位经验丰富的数据科学家,对整个数据科学堆栈都很熟悉. 他擅长开发复杂的机器学习模型,并利用高度易于处理和稳健的统计方法. Besides his technical and statistical expertise, Demetrio的演示风格和强大的可视化效果毫不费力地将高技术成果的关键要点传达给任何观众.

Portfolio

Fortune 100 Foods & Beverages company
Python, Data Science, SQL, Machine Learning, Pandas, NumPy, Matplotlib, Seaborn...
Sclable Academy
Python, Git, Docker, Discrete Optimization, IFC, Open Cascade Technology (OCCT)...
Parkbob
R, Satellite Images, Geospatial Data, Geospatial Analytics, Spatial Analysis...

Experience

Availability

Part-time

Preferred Environment

Linux, PyCharm, Slack, Git, GitHub, Jupyter Notebook, Agile Data Science, Agile Workflow, Python, Rapid Prototyping

The most amazing...

...经验是带领一个小团队解决建筑规划中一个未解决的问题:预装构件的可重用性检测.

Work Experience

Senior Data Scientist

2021 - 2022
Fortune 100 Foods & Beverages company
  • 确定项目的范围,从一个新领域开始:采访涉众并使用数据验证他们的问题.
  • Identified multiple opportunities for an ML solution to alleviate some out-of-stock burdens.
  • 构建端到端机器学习解决方案,在与大型电子商务零售商打交道时预测缺货库存.
  • Led a project initiative to forecast drop-ship volumes.
Technologies: Python, Data Science, SQL, Machine Learning, Pandas, NumPy, Matplotlib, Seaborn, XGBoost, Time Series, Time Series Analysis, Forecasting, ETL, Docker, Modeling, Git, GitHub, Code Review, Geospatial Data, Geospatial Analytics, GIS, Mathematics, Statistics, TensorFlow, Spatial Analysis, Keras, Deep Learning, Data Engineering, Scikit-learn, Statistical Analysis, Supervised Machine Learning, Jupyter, Dashboards, Statistical Data Analysis, Predictive Modeling, Data Analysis, Classification, Explainable Artificial Intelligence (XAI), Models, Communication, Version Control Systems, Google Colaboratory (Colab), Data Modeling, Exploratory Data Analysis, Neural Networks, Regression, Artificial Neural Networks (ANN), Snowflake, Linear Regression, Data Pipelines, Model Development, Amazon Web Services (AWS)

Data Scientist | Project Tech Lead

2019 - 2020
Sclable Academy
  • Supervised other data scientists regarding their tasks on the project.
  • Created technical requirements—Jira stories and tasks—for the upcoming features.
  • Conducted extensive code reviews and established code standards.
  • 管理与开发团队关于解决方案的集成和部署的关系.
  • Worked closely with the project manager on the direction of the project, timelines, team performance, and so on.
  • Built a data pipeline that processed 3D models of buildings into graphs.
  • Translated product requirements into a formal graph optimization problem.
  • Researched and implemented optimization techniques.
  • Visualized intermediate and final model results in a customer-friendly way.
  • Built an uncertainty prediction model for wholesales using a mixture density network.
Technologies: Python, Git, Docker, Discrete Optimization, IFC, Open Cascade Technology (OCCT), NetworkX, API Integration, Code Review, TensorFlow, Agile Project Management, Graphs, Algorithms, Building Information Modeling (BIM), Keras, Deep Learning, Machine Learning, PredictionIO, Linux, Data Visualization, Data Science, Supervised Machine Learning, Bash, Matplotlib, Scikit-learn, Pandas, Statistical Analysis, Jupyter, PyCharm, Slack, GitHub, Jupyter Notebook, Mathematics, Statistics, Modeling, Technical Writing, Optimization, Data Engineering, Automation, Agile Workflow, Agile Sprints, Agile Data Science, Rapid Prototyping, NumPy, Cloud Computing, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Agile, Google Cloud Platform (GCP), Seaborn, Models, Communication, Version Control Systems, Google Colaboratory (Colab), Data Modeling, Exploratory Data Analysis, Neural Networks, Regression, Artificial Neural Networks (ANN), Data Pipelines, Model Development, Amazon Web Services (AWS)

Data Scientist

2018 - 2019
Parkbob
  • 开发了一个基于双向LSTM的NLP解决方案,重点是将交通标志文本简化为简单的机器可读格式.
  • Developed multiple NLP prototypes that supported workflow of GIS department.
  • 扩展了从卫星图像中提取停车位可用性的原型解决方案,并将其用于第一个生产级场景.
  • 指导一名初级数据科学家开发汽车共享车队效率的原型.
  • Took over the scaling and deployment of the above in new markets.
  • Presented model improvements and results in new markets to the customers.
  • Participated in the hiring process and supervised interns.
Technologies: R, Satellite Images, Geospatial Data, Geospatial Analytics, Spatial Analysis, QGIS, GIS, Python, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), TensorFlow, Keras, Fleet Management, Mobility, Linux, Data Visualization, Data Science, Spatial Statistics, Supervised Machine Learning, Bash, Matplotlib, Scikit-learn, Pandas, Statistical Analysis, PyCharm, Slack, Git, Jupyter Notebook, Mathematics, Statistics, Modeling, Technical Writing, Code Review, Deep Learning, Machine Learning, Data Engineering, LaTeX, Automation, Agile Workflow, Agile Sprints, Agile Data Science, Rapid Prototyping, NumPy, Dashboards, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Agile, ETL, Time Series, Time Series Analysis, Seaborn, Forecasting, Classification, XGBoost, Models, Communication, Version Control Systems, Data Modeling, LSTM, Exploratory Data Analysis, Neural Networks, Regression, Artificial Neural Networks (ANN), Linear Regression, Data Pipelines, Model Development

Data Scientist

2016 - 2017
Record Evolution
  • Worked on a 30TB large analytics-oriented data warehouse project.
  • Took over responsibility for the analytics layer of the solution.
  • 翻译所有现有的分析为一个新的数据段,包括大量的性能优化, new requirements, interpretation, and visualization.
  • Cooperated closely with the client regarding enhancements in the analytics layer.
  • 启动了汇总数据的系统质量保证,从而发现了多年来未被注意到的关键不一致之处.
  • Performed various adjustments in the ETL process and services.
  • 开发了一个物联网原型,从树莓派上收集传感器数据并将其上传到云端.
Technologies: SQL, PostgreSQL, Python, Business Intelligence (BI), Risk Modeling, Data Engineering, Continuous Integration (CI), Docker, Kubernetes, Linux, Data Visualization, Data Science, Bash, Matplotlib, Pandas, Statistical Analysis, Slack, Git, GitHub, Mathematics, Statistics, Technical Writing, Code Review, Automation, Cloud Computing, Dashboards, Statistical Data Analysis, Predictive Modeling, Data Analysis, Google Cloud Platform (GCP), ETL, Time Series, Time Series Analysis, Forecasting, Classification, Models, Communication, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Data Pipelines

Junior Researcher

2016 - 2016
The SAFE-FDZ
  • Refactored an economic model's existing numerical solution.
  • 对模型进行了解析扩展,并以算法效率为重点对数值解进行了广泛的改进.
  • Contributed substantially to a working paper by finding and correcting mathematical errors.
Technologies: MATLAB, Numerical Methods, Algorithms, Dynamic Programming, Optimization, Linux, Git, Mathematics, Modeling, Scientific Data Analysis, Technical Writing, LaTeX, Research, Dynamic Systems Modeling, Models, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Model Development

Research Assistant

2015 - 2015
Deutsche Bundesbank, Research Centre
  • Constructed a unique multi-country dataset regarding inflation targeting by central banks.
  • 开发了DSGE经济模型的分析和数值解决方案,解决了代理人的预期和通货膨胀动态.
  • Automated model-mining and generation of structured reports.
  • Visualized, documented, interpreted, and presented the outcomes of our research.
Technologies: MATLAB, LaTeX, Numerical Methods, Research, Dynamic Systems Modeling, Linux, Data Visualization, Data Science, Statistical Analysis, Git, Mathematics, Modeling, Scientific Data Analysis, Technical Writing, Optimization, Dynamic Programming, Automation, Rapid Prototyping, NumPy, Dashboards, Data Analysis, Time Series, Time Series Analysis, Forecasting, Models, Communication, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Linear Regression, Data Pipelines, Model Development

Research Assistant (Part-time)

2014 - 2015
Center for European Economic Research
  • 准备了一个大型科学数据集(大约3900万个条目),只能远程访问和限制访问.
  • Performed statistical data analysis and presented the findings to the research team.
  • 使用Stata和Python的组合开发标准化的结果生成管道.
  • 协助研究团队使用Python实现模型,包括可视化仿真输出, writing unit tests, and optimizing numerical procedures.
Technologies: Statistical Analysis, Research, Automation, Python, Data Visualization, Data Science, Matplotlib, Pandas, Git, Mathematics, Statistics, Scientific Data Analysis, Technical Writing, LaTeX, NumPy, Cloud Computing, Dashboards, Statistical Data Analysis, Data Analysis, Time Series, Time Series Analysis, Forecasting, ETL, Models, Communication, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Linear Regression, Data Pipelines

Staying Ahead of an eCommerce Platform as a Manufacturer

一家主要的食品和饮料制造商看到了消费者购买行为向在线零售的巨大转变. 它的大部分电子商务收入来自于在一个成熟的平台上销售产品.

However, 这种转变有些不稳定:一些产品被宣布缺货,并下架了平台, creating massive revenue losses. Yet, only a small portion of those products was experiencing supply-chain shortages. For most, it was a combination of missed metrics like "delivery window," "weeks of cover," "past orders fill rate," etc. The eCommerce platform did not share the inner workings of its algorithms.

To facilitate weekly planning, 我开发了一个机器学习模型来提前两周预测这种缺货行为. I combined the metrics reported by the eCommerce platform, internal supply-chain data, marketing planning calendar, and more. 该问题被表述为一个时间序列分类,并使用梯度增强树来解决,该树的输入是过去十周的各种每周总和,并结合已知的未来静态因素(如.g., holidays and promotions). I automated the output into a dashboard and delivered it to the stakeholders every Monday.

NLP: Text Simplification | Information Retrieval

http://static1.hotcarsimages.com/wordpress/wp-content/uploads/2018/06/Pick-One.jpg
Traffic signs come in all shapes and forms. And very often, the most important part of a traffic sign is the text below it, especially if the text says when the sign is valid, e.g., "MON 6 PM-8 PM." Those texts are supposed to be roughly standardized and structured. 因此,我们的开发团队通过创建全面的正则表达式来解决将文本转换为严格规则的问题. This worked very well for a while but slowly became unmaintainable, so a scalable approach became necessary.

As our team already had a very comprehensive Regex-based parser, my suggestion was not to train an end-to-end system but a text simplifier. It is almost a machine translation task: all of "MON," "MND," "Mondays" would become "Monday," "Noon-3 PM" would be translated as "12 PM-3 PM," "No Littering!" would be ignored.

For this problem, I trained a state-of-the-art (at the time) NLP model—a bidirectional LSTM with attention. After just two months of development, it achieved a reasonable accuracy (92%) and was suitable for a human-in-the-loop deployment. Additionally, we requested a research grant to scale the solution further.

Wholesales Forecast with Uncertainty

Oftentimes machine learning solutions focus on predicting one number. Specifically, this is not always very useful in wholesales, as the actual daily sales can vary quite a bit. Such variation when not addressed will result either in overfilled storage or empty shelfs. In order to effectively perform capacity planning, a manager should know a range of outcomes that could happen with some degree of certainty.

In order to address this challenge, I've trained a mixture density neural network. 这种体系结构中的输出层是混合分布的参数(在这种情况下是gamma),参数逆最大似然被用作训练的损失. 这允许捕获多模态条件分布或大范围的右偏分布. 由于数据来自不同地理区域的不同商店,并且呈现出强烈的趋势变化, it was first de-trended, then standardized before being modeled by the mixture density network.

Satellite-based Ground Truth for Parking Availability

http://medium.com/ubiq/satellite-based-ground-truth-for-parking-availability-e477c7e1b412
Predicting on-street parking space occupancy is an extremely challenging problem. Mainly as there are no reliable sources of the ground truth.

我们的解决方案是使用卫星图像作为一种可扩展的方法来同时评估全球多个城市的停车情况. The main challenge is not to detect cars on the satellite images, which is just an object-detection problem (a very nasty one, however). It's about putting together a multi-stage pipeline that uses machine learning, heuristic rules, and legal restrictions to output how many free parking spots are there on a street.

The blog article was written by me and explains our approach in great detail.

Car Sharing Fleet Efficiency

http://medium.com/ubiq/the-art-of-fleet-rebalancing-our-ai-tool-to-increase-the-utilization-of-every-single-vehicle-c86731f98c39
一家知名的汽车共享公司向我们提出了一个问题:他们的一些车停好几分钟就被人开走了,有些车闲置了好几天. They already knew it had to do with the geography of the city, population density, major transport hubs, time of day, and so on.

So, 我们提出建立一个机器学习模型,该模型将考虑所有这些影响因素,并判断何时何地汽车需求高,从而开始从需求低的地区重新安置汽车.

Under my supervision and mentoring, a junior data scientist on their first project and I delivered a successful MVP. 通过我与项目组的合作,我们找到了合适的部署策略,并在获得初始数据集的四个月后推出了产品的第一个版本.

Consequently, I took over scaling the solution to multiple cities, adjusting its real-time efficiency, and adding multiple features based on the client's requests and model performance.

Eventually, this has become the most successful product of the startup; they then rebranded and now offer it as their only service.

Languages

Python, R, SQL, Bash, Regex, Snowflake

Libraries/APIs

Pandas, Scikit-learn, Matplotlib, NumPy, NetworkX, TensorFlow, Keras, XGBoost, LSTM, PyTorch

Tools

PyCharm, Git, GitHub, GIS, LaTeX, Jupyter, Seaborn, Slack, PredictionIO, MATLAB

Paradigms

Data Science, Agile Workflow, Rapid Prototyping, Agile, Agile Project Management, Automation, ETL, Building Information Modeling (BIM), Business Intelligence (BI), Continuous Integration (CI), Dynamic Programming

Platforms

Jupyter Notebook, Docker, Linux, Amazon Web Services (AWS), Open Cascade Technology (OCCT), Kubernetes, Google Cloud Platform (GCP), Databricks

Storage

PostgreSQL, Data Pipelines

Other

Statistics, Modeling, Scientific Data Analysis, Technical Writing, Optimization, Code Review, Geospatial Data, Geospatial Analytics, Spatial Analysis, Machine Learning, Data Engineering, Statistical Analysis, Supervised Machine Learning, Data Visualization, Time Series, Time Series Analysis, Agile Sprints, Agile Data Science, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Forecasting, Models, Communication, Version Control Systems, Data Modeling, Data Aggregation, Data Analytics, Exploratory Data Analysis, Regression, Linear Regression, Mathematics, Satellite Images, Natural Language Processing (NLP), Mobility, Deep Learning, Dynamic Systems Modeling, Spatial Statistics, Dashboards, Classification, Google Colaboratory (Colab), Neural Networks, Artificial Neural Networks (ANN), Model Development, GPT, Generative Pre-trained Transformers (GPT), Discrete Optimization, IFC, API Integration, QGIS, Fleet Management, Graphs, Algorithms, Risk Modeling, Numerical Methods, Research, Cloud Computing, Explainable Artificial Intelligence (XAI), ARIMA, ARIMA Models

2012 - 2015

Bachelor's Degree in Economics and Mathematics

University of Mannheim - Mannheim, Germany

Collaboration That Works

How to Work with Toptal

在数小时内,而不是数周或数月,我们的网络将为您直接匹配全球行业专家.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

在24小时内获得专业匹配人才的简短列表,以进行审查,面试和选择.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring