PORTFOLIO 2026

MOHAMMAD
SALEM

AI + DEVELOPMENT

DATA
SCIENCE

Scroll

01 — TECHNICAL ARSENAL

PYTHONTENSORFLOWXGBOOSTSKLEARNPANDASSTREAMLITTABLEAUPOSTGRESQLSUPABASEOPENAILLAMAINDEXWHISPERREACT NATIVEPYTHONTENSORFLOWXGBOOSTSKLEARNPANDASSTREAMLITTABLEAUPOSTGRESQLSUPABASEOPENAILLAMAINDEXWHISPERREACT NATIVEPYTHONTENSORFLOWXGBOOSTSKLEARNPANDASSTREAMLITTABLEAUPOSTGRESQLSUPABASEOPENAILLAMAINDEXWHISPERREACT NATIVEPYTHONTENSORFLOWXGBOOSTSKLEARNPANDASSTREAMLITTABLEAUPOSTGRESQLSUPABASEOPENAILLAMAINDEXWHISPERREACT NATIVE
MACHINE LEARNINGDEEP LEARNINGNLPCOMPUTER VISIONRAGOCRPREDICTIVE MODELINGTIME SERIESSTATISTICAL MODELINGGEOSPATIAL ANALYSISGENERATIVE AIAI-ASSISTED DEVELOPMENTPRODUCT DEVELOPMENTDATA VISUALIZATIONMACHINE LEARNINGDEEP LEARNINGNLPCOMPUTER VISIONRAGOCRPREDICTIVE MODELINGTIME SERIESSTATISTICAL MODELINGGEOSPATIAL ANALYSISGENERATIVE AIAI-ASSISTED DEVELOPMENTPRODUCT DEVELOPMENTDATA VISUALIZATIONMACHINE LEARNINGDEEP LEARNINGNLPCOMPUTER VISIONRAGOCRPREDICTIVE MODELINGTIME SERIESSTATISTICAL MODELINGGEOSPATIAL ANALYSISGENERATIVE AIAI-ASSISTED DEVELOPMENTPRODUCT DEVELOPMENTDATA VISUALIZATIONMACHINE LEARNINGDEEP LEARNINGNLPCOMPUTER VISIONRAGOCRPREDICTIVE MODELINGTIME SERIESSTATISTICAL MODELINGGEOSPATIAL ANALYSISGENERATIVE AIAI-ASSISTED DEVELOPMENTPRODUCT DEVELOPMENTDATA VISUALIZATION

02 — PROFESSIONAL EXPERIENCE

The Journey

Building AI pipelines to process pharmaceutical vendor files using OCR (PyMuPDF, Tesseract, PaddleOCR). Currently implementing text extraction via bounding boxes and date normalization logic.

Developing a RAG-based document retrieval system with LlamaIndex, optimized with chunk tuning and open-source LLMs (Mistral, Phi-2).

Conducting end-to-end evaluation of the document intelligence system—benchmarking OCR accuracy, RAG retrieval quality, and routing performance.

Remote· Externship|Generative AILLMOCRRAGLlamaIndexPaddleOCR

Developing a machine learning classifier to identify pre-seizure physiological patterns by analyzing large-scale EKG/ECG datasets from wearable monitors.

Engineering and training neural networks on multi-dimensional time-series data to detect subtle autonomic shifts that precede clinical seizure onset.

Optimizing model performance for real-time inference on smartwatch hardware and edge devices.

Montclair, NJ · Hybrid· Internship|Healthcare AITime Series AnalysisDeep LearningEdge Computing

Collaborated with the Chief of Obstetrics to translate clinical protocols into app features like BP triggers, observation timers, and treatment windows.

Co-developed core app logic for role-based notifications, enabling real-time coordination between nurses and residents.

Delivered a winning pitch and live demo to hospital stakeholders, focusing on reducing 'door-to-needle' time.

New Brunswick, NJ· Hackathon Win|Healthcare InnovationMobile DevProduct DesignClinical Workflow

Designed a content processing pipeline using Whisper transcription and AI models to generate notes, flashcards, quizzes, and podcasts from lectures.

Built a RAG-based Q&A system enabling users to query their saved content with cited answers, improving retention and study efficiency.

Managed end-to-end product lifecycle: App Store compliance, monetization strategy, user analytics, and privacy infrastructure.

Directed the entire product lifecycle from user research to UI/UX design, leveraging AI tools for development.

Remote· Self-Start|AI-Assisted DevProduct DesignWhisperRAGApp Store

Designed a multi-provider waterfall enrichment process, achieving >96% verified emails and improving data quality significantly.

Reduced outreach and list-building costs by >45% by replacing manual sourcing with automated data-driven systems.

Standardized targeting processes by creating reusable filtering templates and documentation for future campaigns.

Wayne, NJ · Hybrid· Internship|Data EngineeringProcess AutomationWorkflow OptimizationROI Analysis

Cleaned and standardized client databases to unlock new marketing channels and improve customer segmentation.

Generated geographical heat maps of sales performance, identifying 7+ high-value market expansion opportunities.

Developed an XGBoost machine learning model to predict future purchase categories, deploying it via a Streamlit dashboard.

Managed end-to-end sign production projects while simultaneously leading data visualization efforts.

Clifton, NJ· Part-Time|PythonXGBoostStreamlitTableauOperations

03 — SKILLS & TECHNOLOGIES

Technical Expertise

A comprehensive toolkit for solving complex problems across the data science and AI landscape

AI/ML Engineering

Building intelligent systems that learn and adapt

RAG SystemsOpenAI, LlamaIndex, LangChain
Computer VisionTesseract, PaddleOCR, PyMuPDF
Deep LearningTensorFlow, Neural Networks
Classical MLXGBoost, Random Forest, Scikit-learn
LLMsFine-tuning, Prompt Engineering, Evaluation

Data Analytics & Insights

Transforming data into actionable intelligence

AnalysisPython: Pandas, NumPy
VisualizationTableau, Seaborn, Matplotlib
Statistical Modelingstatsmodels, Hypothesis Testing
Time SeriesEKG/ECG Analysis, Forecasting

AI-Assisted Product Development

Leveraging AI tools to build and ship real products

WorkflowAI-Assisted Development, Prompt Engineering
DeploymentApp Store, Streamlit, Cloudflare
BackendSupabase, PostgreSQL, Edge Functions
ProductUX Design, Architecture, Monetization

Data Engineering

Building robust pipelines for data processing

DatabasesPostgreSQL, Supabase
ETL PipelinesPython, Pandas
Data QualityValidation, Enrichment
OCR ProcessingMulti-provider waterfall systems

04 — EDUCATION

Academic Foundation

Montclair State University

Bachelor of Science — Data Science

Minor in Mathematics

Sep 2022 – Apr 2026

CUMULATIVE GPA

3.61

HONORS & AWARDS

Presidential Scholarship

Merit-based academic scholarship

Dean's List

4x recipient

FROM CLASSROOM TO PRODUCTION

Course projects evolved into production-grade analytics and ML systems, including NYC Vehicle Collision Analysis (2M+ records, K-Means clustering), USAID Anti-Corruption Analysis, and Tableau Sales Analytics (interactive dashboards, geospatial mapping)

RELEVANT COURSEWORK

AI & MACHINE LEARNING

Machine Learning
Data Mining
Deep Learning
AI for Cybersecurity
Undergraduate Research

DATA SCIENCE

Advanced Data Science
Data Science & Statistics
Data Visualization
Python Programming
R Programming

CS FUNDAMENTALS

Data Structures & Algorithms
Database Systems
Software Engineering
Computer Systems

MATHEMATICS

Linear Algebra
Multivariable Calculus
Probability & Discrete Math

05 — PERSPECTIVE

My Approach

Building systems that solve real problems.

Turning complex methods into clean code.

Always learning, always iterating.

Bridging the gap between models and users.

Passionate about reliable AI engineering.