Faisal Malik Widya Prasetya
Verified Expert in Engineering
Data Engineer and Developer
Faisal是一名数据工程师,专门研究谷歌和AWS等云数据技术以及端到端数据工程流程. 从设计体系结构和构建基础设施到开发管道操作, he is highly adaptable to new cloud-based, open source, or SaaS technologies. Faisal拥有丰富的经验,通过直接构建端到端数据管道或在其专业领域提供咨询服务,为早期创业公司做出贡献.
Portfolio
Experience
Availability
Preferred Environment
Visual Studio Code (VS Code), Conda, Linux, Docker, Docker Compose, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jira, OpenAI
The most amazing...
...我做过的一个项目是在客户数据仓库上实现成本优化策略, reducing BI usage costs up to 100 times.
Work Experience
Web Scraping Expert
Burak Karakaya
- Developed a real-time web scraper to scrape data from various sources, such as Twitter, Binance Futures Leaderboard, etc., to feed data to the client's trading bot. The scraper can ingest tweets within 200 ms after it is published.
- 在AWS上提供基础设施,以实现高性能网络,使刮刀能够实时工作. 我设置了IP旋转,这样scraper就不会因为绕过新闻来源的IP速率限制而被阻止.
- 为非技术用户提供管理和操作刮刀的方便界面. I use Streamlit and FastAPI to develop these interfaces.
- 利用Redis和C等高性能Python扩展来提高scraper的存储和运行时性能.
Data Engineer
XpressLane, Inc.
- 开发抓取工具,从各个网站抓取数据并推送到BigQuery.
- 创建开发和操作文档,以便客户可以维护解决方案,并可以在将来开发更多功能.
- 从抓取的数据向客户交付报告和仪表板,以帮助客户更好地为M做出决策&A use cases.
Senior Data Engineer
Toptal
- 设计并实现了一个强大的数据管道,从多个营销工具和api(如Google Ads)中提取数据, Facebook Ads, and Twitter Ads, 并使用基于Luigi的内部数据管道工具将其转移到BigQuery.
- 创建数据管道解决方案,有效地从各种学习平台(如Polly)提取数据, Udemy, and Lessonly and consolidated it with BigQuery utilizing Composer, a managed Apache Airflow service provided by GCP.
- 参与数据工程团队拆分头脑风暴会议,提出将团队拆分为数据平台团队和分析工程团队的想法. The analytics engineering team focuses on ETL logic, while the data platform team maintains the infrastructure.
Data Engineer
QuantumBlack
- 开发了内部数据分析工具,可以简化客户端站点上的部署. 我构建的功能是从各种来源摄取数据,并将它们增量地存储在Snowflake上.
- Handled a client request to build a data analytics pipeline and APIs.
- 与客户的分析团队和领导层密切合作,收集分析需求,并从架构设计中仔细规划, to implementation and delivery.
Senior Data Engineer
Flip
- 使用原生谷歌云平台技术构建数据分析生态系统, such as Datastream, Google Cloud Storage, Pub/Sub, Dataflow, and BigQuery.
- 将分析等待时间从最坏情况下的3小时缩短到一个大报告的30秒.
- 维护MySQL和服务器上的cron作业上的数据分析遗留技术,在一个繁重但经常使用的查询上创建计划作业. 繁重的查询可以在不到30分钟的时间内访问,并且具有每日数据的新鲜度.
- Built the data engineering team and onboarded team members on the legacy, current, and future implementation.
Data Engineer
Pintu
- Developed an ELT data pipeline on Amazon EC2. It is turned on and off by AWS Lambda, triggered by using CloudWatch scheduler from various data sources (MySQL, PostgreSQL, MongoDB, Google Sheets, crypto exchange APIs) to the BigQuery data warehouse.
- Implemented partition, clustering, 将BigQuery上的视图具体化,并将分析成本降低了100倍.
- 与财务专家合作制定最佳的做市策略. Implemented and improved the model on the published paper, increasing the liquidity and market activity of the owned asset by 67%.
- 开发了一个欺诈检测系统,在系统安全漏洞的情况下提醒欺诈活动. 此警报通知执行团队,并在四小时内捕获欺诈者. It secured $2 million worth of assets.
- 培训业务用户使用Metabase和Google Data Studio开发自己的BI报告. It led to 70% of Metabase reports being created by the business team, while the other 30% required complex queries.
- 领导数据分析团队,并通过运行冲刺计划实现敏捷文化, standup, and sprint retrospective meetings. 它允许跟踪业务用户请求、数据管道问题和改进.
Data Engineer
Kulina
- Developed ELT processes from application databases, third-party marketing tools, and Google Sheets to BigQuery using Stitch data, which reduced the number of query conflicts on the production database, indirectly improving application performance.
- Developed the Snowflake schema on the data warehouse, increasing data visibility among the business team.
- Deployed, maintained, and administered several BI tools, such as Redash, Data Studio, and Metabase, 获得业务单位级别的数据治理,并使用适当的工具回答与数据相关的问题.
Experience
NASA API Python Wrapper
http://pypi.org/project/python-nasa/Scalable Web Scraper
Then for the transformation, we use PySpark deployed on Dataproc. 我们展示无服务器Spark Dataproc以使我们的转换管道具有成本效益. We use GCS as the data lake, 所以从网站上获取的所有数据都将驻留在GCS和转换输出中. The clean data will then be stored in BigQuery using the BigQuery load job, also orchestrated on Airflow. When the data arrives on BigQuery, 涉众仪表板将使用最近的数据自动更新. We also set up a rotating proxy to avoid getting caught as a bot.
Data Pipeline on GCP
Skills
Languages
Python, SQL, Snowflake, JavaScript, HTML, Python 3, T-SQL (Transact-SQL), Stored Procedure, GraphQL, CSS, PHP, Go, R, Scala
Frameworks
Django, Swagger, Flask, Hadoop, Scrapy, Apache Spark, Spark, Flutter, CodeIgniter
Libraries/APIs
Pandas, Asyncio, Python API, REST APIs, NumPy, Shapely, Scikit-learn, Node.js, OpenAPI, Amazon API, PySpark, Spark ML, OpenCV, Twitter API, SciPy, TensorFlow, Interactive Brokers API, Luigi
Tools
BigQuery, Apache Airflow, GitHub, AWS Glue, Microsoft Power BI, Tableau, Amazon Elastic MapReduce (EMR), Amazon QuickSight, AWS Step Functions, MySQL Performance Tuning, Amazon ElastiCache, Amazon Simple Notification Service (Amazon SNS), Git, Jupyter, Pytest, Kibana, Cloud Dataflow, Apache Beam, Celery, RabbitMQ, Amazon Simple Queue Service (SQS), Docker Compose, Redash, Amazon CloudWatch, Terraform, Amazon Athena, Amazon Redshift Spectrum, Looker, Amazon EKS, Google Analytics, Amazon Cognito, GIS, GRASS GIS, PhpStorm, Navicat, MongoDB Atlas, Stitch Data, Jira, Domo, Google Cloud Dataproc
Paradigms
Business Intelligence (BI), ETL, MapReduce, Stress Testing, REST, Data-driven Design, Design Patterns, Microservices, Microservices Architecture, Database Design, Kanban, Agile Project Management, Data Science, DevOps, Agile, Object-oriented Design (OOD), Object-oriented Programming (OOP), Distributed Computing, Dimensional Modeling
Platforms
Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda, AWS Elastic Beanstalk, SharePoint, Jupyter Notebook, Docker, Amazon EC2, Oracle Database, Azure, Apache Kafka, Oracle, Databricks, Firebase, Azure Synapse, Kubernetes, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW)
Storage
MySQL, PostgreSQL, Microsoft SQL Server, NoSQL, Data Lakes, Database Migration, Amazon Aurora, Data Pipelines, Elasticsearch, Databases, Amazon DynamoDB, Database Modeling, Data Integration, PL/SQL, Amazon S3 (AWS S3), MongoDB, Database Administration (DBA), Redshift, Neo4j, Dynamic SQL, Alibaba Cloud, Google Cloud, Google Cloud Storage, IIS SQL Server, Redis
Other
Conda, Machine Learning, Google BigQuery, Data Engineering, Data Modeling, Data Migration, ETL Tools, Data Analytics, Data Analysis, Data Architecture, Data Management, Amazon RDS, CDC, Data Build Tool (dbt), Cloud Migration, ELT, Big Data Architecture, Architecture, Big Data, Project Planning, Web Scraping, Scraping, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Shell Scripting, Benchmarking, Performance, Performance Testing, Caching, Data Reporting, Software Architecture, Back-end, Artificial Intelligence (AI), Data Scraping, PDF Scraping, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Excel Macros, Automated Trading Software, SaaS, GeoPandas, API Integration, Natural Language Processing (NLP), Serverless, Lint, Consumer Packaged Goods (CPG), Back-end Development, FastAPI, Extensions, Data, Streaming Data, Data Governance, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Multithreading, Entity Relationships, Software Design, Workflow, API Design, AWS Cloud Architecture, Performance Tuning, Amazon API Gateway, SSH, Cryptography, Research, Data Warehousing, Data Visualization, Metabase, Google Data Studio, CI/CD Pipelines, GitHub Actions, Scripting Languages, Data-driven Dashboards, Azure Data Factory, Technical Project Management, Azure Data Lake, Azure Databricks, Business Analysis, Tesseract, QGIS, OpenAI GPT-3 API, Neural Networks, eCommerce APIs, GPT, LangChain, SharePoint Online, Data Auditing, Business Architecture, Enterprise Architecture, Mathematics, Kedro, Amazon Neptune, Snowpark, Dataproc, Credit Modeling, OpenAI
Education
Bachelor's Degree in Computer Science
Gadjah Mada University - Yogyakarta, Indonesia
Certifications
Infrastructure Automation with Terraform Cloud
Udemy
Google Cloud Professional Data Engineer
Udemy
How to Work with Toptal
在数小时内,而不是数周或数月,我们的网络将为您直接匹配全球行业专家.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring