Hi, my name is Ray!

I'm an Engineer |
Web Developer | Educator |
Data Enthusiast.

Know more

About me

Profile Image

Mechanical Engineer | Web Developer | Data Enthusiast

Hi there! I'm a mechanical engineering graduate who ventured into web development and discovered a passion for data science along the way. For nearly two years, I've been building websites and applications using frameworks like Vue, React, and Next.js, making projects smoother and more efficient.

These days, I'm diving deeper into data science, machine learning fundamentals and data engineering—learning how to create and maintain data pipeline, build, evaluate, and deploy models, as well as discovering new ways to bring data insights into development. I love how data combines engineering and analytics to solve real-world problems, and I'm excited about how these skills can enhance my work even more.

Outside of coding, you'll usually find me with a light novel, manga, or manhwa, or enjoying coffee shop vibes while exploring the latest in tech.

Let’s connect if you're into data, development, or just want to chat about new tech trends!

View Resume

Data Engineering and Data Science/Machine Learning Projects

Stack Overflow End-to-End Data Pipeline

This project analyzes 14 years of Stack Overflow Developer Survey data to uncover valuable insights into technology trends, developer experiences, and industry shifts. The analysis covers a wide range of topics, including programming languages, salary distribution, education demographics, job roles, and predictions for future tech trends.

Stack:

    Python: Core language for data extraction, transformation, and loading (ETL) scripts.

    Docker: Containerizes the Airflow environment and dbt transformations.

    Apache Airflow: Manages and schedules data pipeline workflows.

    dbt (Data Build Tool): Models and transforms data inside BigQuery.

    Terraform: Provisions GCS buckets and BigQuery datasets/tables as infrastructure.

    Google Cloud Storage (GCS): Stores raw, cleaned, and transformed data files.

    Google BigQuery: Hosts final structured datasets ready for analytics.

    PySpark: Cleans and standardizes raw survey data across multiple years.

    Pandas: Used for lightweight transformations and quick data manipulations where appropriate.

Source Code

Amazon Sales Data Analysis

This project processes and analyzes Amazon sales data to generate insightful metrics and visualizations, including sales performance, return rates, profit margins, and fee analysis.

Features:

    Data Processing: Extracts detailed fee information from raw Amazon sales data.

    Sales Metrics: Calculates total sales, net proceeds, return rates, and profit margins.

    Visualizations: Generates bar charts for sales, return rates, profit margins, and fee analysis.

Source Code

Weather Data Pipeline

This project collects, processes, and analyzes weather data to enable insightful visualizations and trend analysis using modern data tools.

Features:

    Data Collection: Fetches real-time and historical weather data from a public API.

    Data Storage and Processing: Cleans and stores data in PostgreSQL using Python ETL scripts within Docker containers.

    Visualizations: Uses Metabase to create dashboards showing temperature trends, humidity levels, and weather anomalies.

Source Code

Machine Learning Zoomcamp Coursework

This project is a collection of coursework from the Machine Learning Zoomcamp, covering various topics in machine learning and data science.

Modules:

    Module 1: Introduction to Machine Learning

    Module 2: Machine Learning for Regression

    Module 3: Machine Learning for Classification

    Module 4: Evaluation Metrics

    Module 5: Deploying ML Models

    Module 6: Decision Trees & Ensemble Learning

    Module 7: Neural Networks & Deep Learning

    Module 8: Serverless Deep Learning

    Module 9: Kubernetes & TensorFlow Serving

Coursework Course

Contact

Need to know more?

CONNECT HERE!