Data Analyst and Research Engineer with 8+ years building full-stack data applications — from anonymisation engines and policy analytics platforms to AI-powered dashboards and R packages — for EU institutions, government agencies, and research projects. Core stack: Python, R, SQL, Power BI, cloud deployment.
PhD in Quantitative Ecology & Statistical Modelling (University of Copenhagen, Marie Curie Fellowship). External Expert for the EU Scientific, Technical and Economic Committee for Fisheries (STECF) since 2017, contributing to stock assessments, regulatory impact analyses, and fisheries data infrastructure across 10+ Expert Working Groups. Currently contributing to European Commission projects (DG EMPL) involving AI-powered analytics and digital transformation.
Autonomous AI agent that decodes EU regulatory text, extracts compliance timelines, analyses ambiguities, contradictions, stakeholder impacts, and cross-references, then delivers executive briefings. Three modes: single-regulation deep analysis, side-by-side document comparison (mapping transposition gaps, gold-plating, contradictions, and national additions), and corpus-wide Q&A with synthesised intelligence briefings.
Stack: Python · Gemini API · Cloud
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Backend | FastAPI, SSE streaming for real-time agent log, async pipeline generators |
| LLM | Gemini (analysis, query expansion, comparison) with structured JSON output |
| PDF processing | pymupdf extraction, article-level regex chunking, compress-then-merge strategy |
| EUR-Lex integration | SPARQL (EuroVoc subject search + citation graph + title keyword) via Cellar endpoint, REST for full text fetch by CELEX |
| Search intelligence | LLM-powered query expansion: natural language → EuroVoc terms + title keywords + CELEX candidates, all verified against EUR-Lex |
| Embeddings | sentence-transformers (BGE / legal-SBERT configurable), cosine similarity retrieval |
| RAG | Semantic search over saved chunk embeddings, context injection into LLM prompts, library-aware query enhancement |
| Compare | Embedding-assisted provision pre-matching (similarity matrix, greedy assignment), gap/gold-plating detection before LLM merge |
| Document storage | SQLite with swappable backend (MongoDB stub), fingerprint-based deduplication, auto-generated smart tags |
| Smart tags | Auto-extracted: domain, jurisdiction level, country, regulation type, topics, related CELEX, implements (parent regulation linkage) |
| Cross-references | CELEX extraction from LLM output + regex fallback, EUR-Lex verification, library status badges, one-click analyse-from-search |
| Frontend | Vanilla HTML/CSS/JS, dark “classified dossier” aesthetic, agent log with animated reasoning trace |
| Demo mode | 3 pre-loaded regulations + 3 policy questions + 1 transposition comparison, fully offline |
Interactive analysis of 5 million catch records across 23 EU Member States and 12 years of STECF fisheries data (2013–2024). Evaluates the Landing Obligation’s impact on discards, detects fleet behavioural shifts using PCA and clustering, and delivers everything as a single self-contained HTML report with linked dashboards — no server required.
Five crosstalk-linked dashboards with Sankey diagrams, sunburst charts, drill-down visualisations, and searchable tables. Includes species-level discard analysis with EWG 25-10 exemption mapping and a complete ML pipeline (Hellinger transform → PCA → silhouette-optimised k-means → transition flows).
Stack: R · R Markdown · Plotly · Highcharter · Crosstalk · Reactable · echarts4r · ggiraph · cluster · vegan
Live Report GitHub| Layer | Technology |
|---|---|
| Data pipeline | data.table + dplyr: raw CSVs → aggregated .rds files via prepare_data.R |
| Visualisation | Plotly (stacked areas, Sankey, sunburst, treemap), Highcharter (drill-down), ggiraph (hover heatmaps), echarts4r (animated races) |
| Interactivity | 5 crosstalk-linked dashboards: SharedData groups linking plotly charts + reactable/DT tables via filter_select/filter_slider |
| Discard analysis | Species-level discard ratios, LO coverage flags, before/after comparison, exemption treemap hierarchy |
| ML pipeline | Hellinger transform → PCA (compositional) → silhouette-based optimal k → k-means → Sankey transitions |
| Data source | STECF FDI 2025 data call (EWG 25-10) + Annex 3 exemptions Excel (multi-sheet, merged-cell headers) |
| Output | Self-contained HTML (12 MB), all JS/CSS/data embedded, code-folded, floating TOC |
No-code analytics platform with an autonomous AI agent that explores your data without you clicking anything. The Auto-Discover module runs a full ReAct loop (Plan, Execute, Reflect, Narrate) — it generates hypotheses, runs statistical tests across 18 safe analytical actions, reflects on findings, iterates with follow-up analyses, and delivers an executive intelligence report with reproducible R code. Multi-table support with automatic join detection. No eval(), no parse() — the LLM can only trigger whitelisted statistical operations.
Three AI layers: an autonomous exploration agent (Gemini 2.5 Flash), a rule-based engine that works offline with no API key, and an LLM interpreter that synthesises results across all modules.
Around that AI core, the full analytics pipeline: automated EDA, parametric/non-parametric testing with auto post-hoc comparisons, 9 ML algorithms with full interpretability (DALEX, LIME, SHAP, PDP), PCA/clustering, publication-ready ggstatsplot output with embedded APA-style results, network analysis, pivot tables, and interactive drag-and-drop visualisation. 40+ R packages unified in a modular architecture with 160+ tests.
Stack: R · Shiny · bslib · caret · DALEX · lime · ingredients · Gemini 2.5 Flash · ggstatsplot · plotly · ggplot2 · GWalkR · esquisse · FactoMineR · reactable · igraph · networkD3 · testthat
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Framework | Shiny, bslib (Bootstrap 5, Flatly theme), responsive sidebar layouts, shinycssloaders |
| Tables | reactable (color-coded p-values, inline bar charts, significance stars), DT |
| Visualization | ggplot2, plotly, ggpubr, ggstatsplot (APA-style publication plots), esquisse (code export), GWalkR (Tableau-like), corrplot, VIM |
| Networks | ggraph, igraph (correlation networks), networkD3 (Sankey flow diagrams) |
| Statistical testing | rstatix, infer (permutation tests + bootstrap CIs), broom — auto-selected tests with post-hoc comparisons and assumption checks |
| ML pipeline | caret with 9 model types, preprocessing (imputation, NZV, correlated feature filtering, Box-Cox/Yeo-Johnson, PCA), k-fold/repeated CV/LOOCV/bootstrap, auto and custom tuning grids |
| Interpretability | DALEX permutation importance, PDP + ALE + ICE (ingredients), LIME, Break Down, SHAP, interactions + Johnson-Neyman, easystats diagnostics, sjPlot + modelsummary |
| Auto-EDA | skimr, dlookr, SmartEDA, DataExplorer — target analysis with effect sizes (Cohen’s d, eta-squared, Cramér’s V) |
| Classification | pROC (ROC curves), confusion matrix metrics, per-class performance, rpart.plot (tree visualization) |
| AI Assistant | Rule-based engine (pure R, no API) + Gemini API (free tier) via httr/jsonlite — guided analysis, auto-EDA pipeline, cross-cutting results interpretation |
| PCA & Clustering | FactoMineR + factoextra (PCA, FAMD, biplots, cos2 maps), k-means + hierarchical, Hopkins statistic, elbow/silhouette/gap analysis |
| Agentic AI | Custom ReAct loop (plan → execute → reflect → narrate), Gemini API, structured JSON action dispatch, iterative hypothesis testing |
| Multi-table | Automatic join detection (name matching, FK patterns, value overlap analysis), cross-table hypothesis generation, safe merge with collision handling |
| Start screen | Two-mode app architecture: Classic (13+ tabs, user-driven) and Auto-Discover (streamlined, AI-driven), renderUI-based mode switching with shared reactive state |
| Reporting | R Markdown HTML reports with saved analyses, plots, narratives, and model performance tables |
| Testing | testthat + testServer: unit tests, AI rules engine tests, module tests, E2E ML+interpretability tests, shinytest2 integration tests |
R package that migrates entire Excel workbooks to R in one step. Upload any multi-tab .xlsx — the package extracts every formula, translates 62 Excel functions to R equivalents, resolves cross-sheet references via topological sort, exports raw data as tidy CSVs, and produces a standalone R script that uses only base R with zero dependencies. Built-in verification compares every computed value against Excel’s cached results before you commit to the migration.
One-liner API: excel2r::migrate("workbook.xlsx", "output/"). Also includes a 5-step interactive
Shiny app (excel2r::run_app()). Installable via remotes::install_github("emantzoo/excel2r").
150+ tests, CI on 5 platforms, vignette included.
Stack: R · tidyxl · openxlsx2 · Shiny · bslib
Live Demo GitHub Blog Post| Layer | Technology |
|---|---|
| Package API | 5 exported functions: migrate(), process(), verify(), supported_functions(), run_app() |
| Frontend | Shiny + bslib (Bootstrap 5, Flatly theme), DT for interactive formula review, 5-tab workflow |
| Parser | Custom balanced-parenthesis tokenizer, right-to-left safe replacement, string-literal awareness |
| Reference transform | Cell refs → R syntax (D10 → Sheet$D[10]), cross-sheet, absolute $ refs, whole-column ranges |
| Function registry | 62 Excel functions mapped to R equivalents, per-function handler with error wrapping |
| Dependency engine | Kahn’s topological sort — two-level: sheet ordering first, then cell ordering within sheets |
| CSV export | Tidy long-format (row, col, value) per sheet, filtered to only formula-referenced cells, blank input slots preserved |
| Grid reconstruction | reconstruct_grid() helper embedded in generated script rebuilds positional data frame from tidy CSV |
| Verification | Runs generated script in isolated environment, compares R values vs Excel cached results cell by cell, classifies matches/precision diffs/real mismatches |
| Named tables | Auto-detection via openxlsx2::wb_get_tables(), header extraction, named data frame generation |
| Conditional aggregation | Custom SUMIF/SUMIFS/COUNTIF/COUNTIFS/AVERAGEIF/AVERAGEIFS helpers embedded in output (no external dependency) |
| Lookup helpers | VLOOKUP (exact + approximate match), HLOOKUP, INDEX, MATCH, XLOOKUP |
| Script generation | Self-contained .R output — Excel mode (reads .xlsx) or CSV mode (base R only, zero dependencies) |
| CI | GitHub Actions on macOS, Windows, Ubuntu (R release, devel, oldrel) — R CMD check with 0 errors, 0 warnings |
| Testing | 150+ tests: unit (parser, transforms, dependency ordering, CSV export, verification), integration, API tests |
Real-time behavioural risk intelligence dashboard for the Mediterranean Sea. Ingests AIS gap, encounter and loitering events from the Global Fishing Watch API and scores each event using a custom formula weighing duration, event type, flag-state risk and offshore location. 11 analytical tabs expose patterns from daily risk trends to encounter proximity analysis and gap speed profiling.
Features an embedded AI Maritime Analyst (Gemini 2.5 Flash) with RAG-injected domain knowledge covering IUU fishing, Mediterranean geography and flag-state risk context.
Stack: Python · Streamlit · Pandas · Plotly · Folium · GFW Events API · Google Gemini · RAG
Live Demo GitHub| Layer | Technology |
|---|---|
| Frontend | Streamlit (wide layout, 11 tabs), Folium interactive map, Plotly charts |
| Data ingestion | GFW Events API (async Python client), GeoJSON polygon filtering for Mediterranean |
| Risk model | Custom formula: (duration^0.75) × event_weight × flag_multiplier × offshore_bonus |
| Analytics | Duration histograms, flag×event heatmap, gap speed profiling, encounter proximity, Med zone breakdown, EEZ analysis, repeat offender detection |
| AI analyst | Gemini 2.5 Flash with RAG (4 knowledge docs), sandboxed code execution, safety checks |
| Data | 23-column enriched dataset: vessel identity, distances, gap/encounter/loitering-specific fields |
| Deployment | Streamlit Cloud, secrets management, static fallback dataset |
Production web application for the PLANT-B research project (PRIMA grant, Benaki Phytopathological Institute) used by agronomists across 4 Mediterranean countries. Converts pesticide usage data into real-time environmental impact scores across biodiversity, ecotoxicology, pollution, and human health compartments — replacing a manual workflow of Excel spreadsheets and offline R scripts.
SQLite backend with 15+ tables and 9,000+ reference records, 15 Shiny modules with reactive state management, and Dockerised deployment on Google Cloud Run.
Stack: R · Shiny · bslib · plotly · ggplot2 · SQLite · Docker · Google Cloud Run
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Frontend | R/Shiny, bslib (Bootstrap 5), dark mode, CSS custom properties |
| Visualisation | plotly, ggplot2, reactable (React-based tables), DT |
| Database | SQLite (WAL mode) — 15+ tables, 9,000+ reference records (PPP products, active substances, ecotox thresholds) |
| Architecture | 15 Shiny modules with reactive state management, shared scoring engine adapter pattern |
| Scoring engine | Custom R pipeline: IAP_PREP, triplet decomposition, weighted rank scoring, compartment aggregation with Scoring Impact Factors |
| Auth | shinymanager with encrypted credentials (disabled for public demo) |
| Deployment | Docker (rocker/shiny base), Google Cloud Run (europe-west1), Artifact Registry |
| Testing | testthat, testServer, shinytest2 (unit + integration + E2E) |
Replaced a manual workflow of Excel spreadsheets and offline R scripts with a unified web application that reduced the time from data entry to environmental impact assessment from hours to seconds.
Interactive diagnostic dashboard that takes raw survey data from Greek businesses (Likert scales, checklists, ratings across 11 question blocks) and produces SWOT classifications, priority matrices, gap analyses, and implementation roadmaps. Three data-entry paths — Greek questionnaire, Excel upload, or AI conversational assessment.
15 reactive computations, three independent recommendation engines (rule-based, dynamic, priority-based), six interactive drill-down charts, and per-tab AI narrative generation via Cerebras LLM.
Stack: R · Shiny · bslib · ggplot2 · plotly · ggiraph · DT · Cerebras API · Docker
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Frontend | Shiny + bslib (Bootstrap 5, Flatly theme), custom CSS, shinycssloaders, 12 modular UI components |
| Visualisation | ggplot2, plotly (click events + tooltips), ggiraph (interactive lollipop), DT, reactable |
| Analysis engine | 15 reactive computations: attribute join, SWOT, dimension, maturity, priority, gap, roadmap, heatmap, bottleneck, risk, funnel, dependency, comparison. bindCache on heavy computations |
| Recommendations | Rule-based (business_rules.xlsx pattern matching), dynamic (score-category lookup), priority-based (quadrant extraction + ranking) |
| Narrative generation | 4 template-based generators (100K+ lines of rule-based text), 11 tab-specific LLM prompt builders |
| AI assessment | Role-adaptive conversational scoring (CEO/CTO/Consultant personas), dimension-by-dimension prompts, JSON response parsing |
| LLM integration | Cerebras API, llama3.1-8b, temperature 0.3, 120s timeout, 2 retries with 5s backoff |
| Server architecture | Orchestrator pattern: app.R sources UI components, reactives module, outputs module, AI assessment module |
Replaces static questionnaires with a dynamic diagnostic engine for retail SMEs. Routes 44 questions adaptively toward the highest-scoring unconfirmed pain cluster using a 7-cluster Bayesian-inspired scoring model with sector priors — assessments complete in under 5 minutes.
19 solutions filtered by hard constraints (budget, time, support) before composite scoring, a 20-node digital twin process map with R-expression status rules, and three audience-specific outputs: plain language for SME owners, evidence tables for advisors, and SWOT/radar/roadmap for analysts.
Stack: R · Shiny · bslib · ggplot2 · plotly · fmsb · gt · pagedown · Docker · Cloud Run
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Frontend | Shiny + bslib (Bootstrap 5), shinyjs auto-advance, custom CSS (choice tiles, symptom cards, progress indicators) |
| Scoring engine | 7-cluster weighted accumulation, sector priors, flag-driven conditional routing, confirmation threshold logic |
| Adaptive routing | get_next_question: filter by condition/skip_if/maturity/sector, prioritize highest unconfirmed cluster, exit on 2 confirmed or 15 answered |
| Solution matching | Hard filters (budget, time, support, sector, excluded_if expressions) + soft composite scoring (pain x 0.4 + budget_fit x 0.3 + time_fit x 0.3) |
| Digital twin | ggplot2 tilemap, 20 nodes x 4 tiers, R-expression status rules, integration gap pair detection |
| Visualization | ggplot2 + plotly (process map, SWOT bars, gap bars, priority matrix, roadmap tiles), fmsb (radar), ggrepel |
| Reporting | pagedown::chrome_print for PDF, officedown for Word, flextable for formatted tables |
| Data model | 373-column question CSV (13 base + 180 option weights + 180 flags), 19 solutions, 20 process nodes — all logic in CSVs |
| Testing | 180 passing tests (5 unit + 3 browser), build_mock_state/build_completed_state fixtures, shinytest2 E2E |
Code-first Power BI dashboard framework that generates complete multi-page dashboards as PBIR JSON — no drag-and-drop. 18 visual types (cards, gauges, scatter plots, waterfalls, ribbons, maps, matrices, and more), each defined in one line of Python that replaces 15+ lines of boilerplate JSON.
Includes a Claude skill for auto-generating dashboards from raw data: drop CSVs + a one-paragraph business brief, and it designs the star schema, writes DAX measures, and generates the full visual layout script. Four example dashboards (E-Commerce, Hospital, HR, Supply Chain) with 135 DAX measures.
Stack: Power BI · DAX · Python · PBIR/JSON · MCP Server · Claude Skill
GitHub| Layer | Technology |
|---|---|
| Visual generation | 18 make_* Python functions generating PBIR visual.json files on a 1280×720 canvas |
| Data bindings | Measure field references (DAX measures) + column field references, cross-table bindings |
| Auto-generation | Claude skill: CSV inspection, star schema design, DAX measure writing, visual layout generation from a one-paragraph brief |
| Data model | Star schema with active/inactive relationships, Calendar table, _Measures table for all DAX |
| DAX patterns | LASTDATE snapshot, EARLIER self-join, USERELATIONSHIP, SAMEPERIODLASTYEAR, DATESINPERIOD, POWER, SUMX+RELATED, SWITCH RAG |
| Workflow | Phase 0–1 (data model via MCP or manual) → Phase 2 (Python visual generation) → Phase 3 (open and polish) |
End-to-end biostatistical analysis of a simulated greenhouse experiment (RCBD with repeated measures), demonstrating the full workflow from experimental design through mixed models, dose-response curves, and multivariate analysis in a single reproducible R Markdown report.
Covers mixed ANOVA with Dunnett’s test and effect sizes, negative binomial GLMM with simulation-based diagnostics, 4-parameter log-logistic dose-response modelling with bootstrap CIs, and MANOVA/PCA/PERMANOVA.
Stack: R · R Markdown · afex · emmeans · glmmTMB · drc · boot · FactoMineR · vegan · DHARMa · ggdist · gtsummary · bookdown
Live Report GitHub| Layer | Technology |
|---|---|
| Experimental design | RCBD: 4 blocks × 6 treatments × 3 pots × 3 time points (216 obs), mock data via multivariate simulation |
| Univariate models | afex::aov_car (mixed ANOVA), glmmTMB (negative binomial GLMM), emmeans (Dunnett’s contrasts) |
| Effect sizes | effectsize: partial η² for ANOVA, Cohen’s d forest plot with 95% CI for treatment vs control |
| Multivariate | MANOVA (Pillai’s trace), FactoMineR PCA with factoextra biplot, vegan::adonis2 PERMANOVA |
| Dose-response | drc::drm LL.4 + W1.4 + W2.4, AIC model comparison, confidence bands, ED10–ED90 with delta CIs |
| Bootstrap | 1000-resample nonparametric bootstrap for ED50 with percentile CI (boot package) |
| Diagnostics | DHARMa simulation residuals (GLMM), performance::check_model (ANOVA), QQ + Cook’s distance |
| Output | bookdown::html_document2 with floating TOC, numbered figures, gtsummary tables, DT interactive data |
Tutorial R package with a pkgdown website covering five classical experimental designs used in agricultural research (CRD, RCBD, Latin Square, Factorial, Split-Plot). Each design includes a built-in dataset, a detailed vignette with full analysis pipeline, and a helper function for assumption checking.
Every vignette walks through EDA, model fitting, assumption checks, Tukey HSD, and emmeans contrasts. R CMD check passes with 0 errors, 0 warnings, 0 notes.
Stack: R · ggplot2 · emmeans · lmerTest · pkgdown · GitHub Actions
pkgdown Site GitHub| Design | Model | Use Case |
|---|---|---|
| CRD | aov(Y ~ A) | Homogeneous units, single factor |
| RCBD | aov(Y ~ Block + A) | One blocking factor |
| Latin Square | aov(Y ~ Row + Col + A) | Two blocking factors (grid) |
| Factorial | aov(Y ~ A * B) | Crossed factors + interaction |
| Split-Plot | lmer(Y ~ A*B + (1|Rep:A)) | Restricted randomisation |
check_assumptions()Two educational games for Minecraft Education Edition that make classification and search algorithms tangible through gameplay. An AI agent sorts animals using fine-grained vs coarse classification strategies, then navigates a procedurally generated reward grid comparing Random Walk, Greedy, and memory-augmented search — each strategy adding one improvement, making algorithmic thinking incremental.
Includes a self-contained React browser demo (no Minecraft licence needed), Fisher-Yates shuffle adapted for MakeCode constraints, and XOR-based backtrack prevention.
Stack: MakeCode Python · Minecraft Education Edition · React (browser demo)
Live Demo GitHub| Layer | Technology |
|---|---|
| Game 1: Sorting | Agent patrols 50 spawned animals, sorts by type (lucky1: 5 pens) or category (lucky2: FARM/EXOTIC) |
| Game 2: Search | 10×10 grid with Stone (+3), Diamond (+5), Lava (-5), Gold (+10); 4 strategies compete |
| Backtrack prevention | XOR^1 on direction indices (0↔1, 2↔3) gives opposite direction in O(1) |
| State tracking | Visited blocks transform to colored wool, giving real-time visual trail of agent’s path |
| Shuffle algorithm | Fisher-Yates adapted for MakeCode Python (no len(), no range()) |
| Path evaluation | test_paths runs 5 trials from fixed start with random detours, computing reward/step ratios |
| Browser demo | Self-contained HTML with React 18 + Babel standalone from CDN, no build step |
Production anonymization engine that protects individual-level datasets against re-identification using four methods (k-Anonymity, Local Suppression, PRAM, Noise Addition), selected automatically by a 40+ rule decision engine. Backward elimination risk analysis drives every downstream choice — from variable classification to per-QI protection parameters.
Features an adaptive retry loop with escalation and cross-method fallbacks, a composite utility score (Pearson correlation, KL divergence, optional ML validation), and optional AI-powered column classification via Cerebras Qwen 235B. Tested against 17 real-world Greek administrative datasets.
Stack: Python · Streamlit · Pandas · Plotly · scikit-learn · R/sdcMicro · Cerebras API
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Frontend | Streamlit, custom CSS, session state management |
| Visualisation | Plotly (risk histograms, variable importance bars, before/after overlays) |
| Risk engine | Custom pipeline: per-record ReID, backward elimination, structural risk, variable importance ranking |
| Protection engine | 4 methods, 40+ selection rules, dynamic pipeline builder, multi-phase retry with escalation + fallbacks |
| Method selection | Rules engine (RC, CAT, LDIV, DATE, QR, LOW, DP, HR rule families), suppression-gated kANON |
| Preprocessing | Type-aware routing (6 priority tiers), adaptive tier loop (light to very aggressive), risk-weighted per-QI cardinality limits |
| Privacy metrics | k-anonymity, l-diversity (distinct + entropy), t-closeness (EMD/TVD), uniqueness rate, disclosure risk |
| AI integration | Cerebras Qwen 235B for column classification and method recommendation (optional) |
| R integration | sdcMicro for optimal local suppression and correlated noise (optional, Python fallback) |
| Testing | pytest (unit + integration), 17-dataset test suite covering real-world Greek administrative data |
Policy analytics platform tracking the EU Council Recommendation on Fair Transition across 27 member states. Ingests 1,000+ policy measures from MongoDB and provides 14 interactive visualisations with 11 simultaneous filter dimensions — every AI prompt receives the active filter state so responses always reflect what the user is looking at.
Includes a RAG-based semantic policy search (MongoDB vector search, top-5 retrieval with citations), a 6-section strategic intelligence framework, and AI-powered gap analysis. Deployed on Cloud Run with graceful degradation — the dashboard stays fully functional even without an API key.
Stack: Python · Django · HTMX · MongoDB · Plotly · Tailwind CSS · Cerebras API · Cloud Run
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Frontend | Django 5 + HTMX (partial page loads, no SPA), Tailwind CSS, responsive grid layout |
| Visualisation | 14+ Plotly chart types (choropleth, bar, pie, heatmap, stacked bar), responsive sizing |
| Data pipeline | MongoDB aggregation pipelines with allowDiskUse, QueryBuilder mapping 11 filter dimensions to $match/$unwind/$regex stages |
| AI integration | 8 generation functions: chart narratives, document analysis, in-depth analysis (5 focus modes), Q&A, strategic analysis (6 sections + synthesis), batch/country summaries, gap analysis |
| RAG pipeline | Query embedding, MongoDB $vectorSearch, top-5 retrieval, metadata enrichment, LLM answer with source citations |
| Prompt engineering | 8 specialised system personas, filter-aware context injection, <think> tag stripping for reasoning models |
| Caching | Django LocMem cache with namespace keys (ai:{type}:{md5}), filter-aware invalidation, 1-hour TTL |
| Deployment | Docker (python:3.12-slim), gunicorn (2 workers + 4 threads), Cloud Run (512MB, auto-scale 0-2), MongoDB Atlas |
| Seeding | seed_mongo.py generates 300 docs / 1,050 measures / 80 linkage groups across 14 EU countries |
A clinician asks a question in plain English; the agent writes Python code to analyse wearable sensor gait data, executes it in a sandbox, and returns a clinically contextualised answer with full reasoning trace. Built on the PHIA pattern (Nature Communications, Jan 2026) — the first application of this approach to PD gait monitoring data.
Custom ReAct loop (no LangChain), MongoDB Atlas vector search over 17 clinical knowledge chunks, and synthetic data modelling realistic PD subtypes with medication wearing-off and freezing episodes. Entire stack runs on free-tier services at zero cost.
Stack: Python · Streamlit · Gemini 2.5 Flash · MongoDB Atlas · sentence-transformers · Pandas · NumPy
Live Demo GitHub Private repo — available on request| Layer | Technology |
|---|---|
| LLM | Gemini 2.5 Flash (Google AI Studio, free tier) |
| Agent framework | Custom ReAct loop (no LangChain dependency) — prompt parsing, tool dispatch, iteration control |
| Code execution | Sandboxed Python with pre-loaded pandas DataFrame and patient profile dict |
| RAG | MongoDB Atlas vector search, all-MiniLM-L6-v2 embeddings (sentence-transformers, runs locally), 17 clinical chunks |
| System prompt | ~35k chars assembled from role description, clinical knowledge, data schema, patient profile, 6 few-shot ReAct trajectories, tool descriptions |
| Data | Synthetic gait data: step_length, stride_time, cadence, stride_variability, asymmetry_index, freezing_flag, medication_state, hours_since_dose |
| Frontend | Streamlit with patient selector, context cards, example question buttons, expandable reasoning trace |
Streamlit app that turns domain literature into production-ready knowledge bases for LLM retrieval-augmented generation (RAG). Ingest evidence from PDFs, PubMed, or structured JSON — every chunk is embedded, tagged with domain features, and stored in a ChromaDB vector index that any RAG pipeline can query with semantic search and metadata filtering. No coding required.
Evolved from a hardcoded Alzheimer’s research pipeline into a fully configurable tool where domain experts swap vocabularies, not code. Smart deduplication with cosine-similarity calibration merges near-duplicates without losing tags. Coverage gap tracking shows exactly where your KB is thin before your LLM starts hallucinating. Auto-generated extraction prompts let users paste papers into ChatGPT/Claude/NotebookLM and import the structured output directly.
Stack: Python · Streamlit · ChromaDB · sentence-transformers · Pydantic · Gemini API · PyMuPDF · PubMed E-utilities
GitHub Private repo — available on request| Layer | Technology |
|---|---|
| Frontend | Streamlit multipage app (Setup, Add Sources, Knowledge Base), session state management |
| Data models | Pydantic v2: Chunk, SourceInfo, ProjectConfig — all pipeline modules receive ProjectConfig, no hardcoded vocabularies |
| Ingestion | Three sources: JSON import with validation, PDF upload (PyMuPDF section detection + overlapping chunks), PubMed abstract search (NCBI E-utilities, free tier) |
| LLM extraction | Gemini 2.5 Flash via google-genai: dynamic tagging prompts built from ProjectConfig.features, structured JSON output mode |
| Prompt generation | Auto-generated extraction prompts (Prompt A/B) from project config — users paste papers into external LLMs and import the JSON output |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2 default, configurable), generic embedder accepts any model |
| Vector store | ChromaDB with persistent SQLite backend, HNSW index, local — no cloud setup required |
| Deduplication | Cosine similarity with tag merging (not discard). Calibrator shows similarity histogram, percentile stats, threshold impact table, top-N similar pairs |
| Coverage | Per-feature chunk counts, coverage type breakdown, gap warnings, min_chunks_per_feature threshold |
| Normalization | Fuzzy feature-name matching against ProjectConfig vocabulary (handles typos, case variations) |
Multi-sided platform for Athens nightlife that connects three audiences through role-based interfaces. Users discover events happening tonight, see who’s going before they commit, and match with people at the same venue. Venues get a management dashboard for promotion, attendee tracking, and talent booking. Artists find gigs and build their audience through event integration.
Squad matching lets friend groups find events together and discover other groups going to the same place. Designed around cross-role network effects — each new user, venue, or artist makes the platform stronger for everyone else.
Stack: Supabase · React
GitHub Private repo — available on requestB2B event networking platform designed for Greek SMEs attending professional conferences, trade shows, and workshops. Solves the biggest pain point of business events: walking in blind. Attendees publish structured profiles with explicit intent (looking for / offering), browse other attendees before the event, and schedule qualified meetings in advance.
Organisers get attendee analytics, demographic breakdowns, and engagement tracking. Built around measurable ROI — every interaction is trackable, so SMEs know whether an event was worth attending. Aligned with EDIH digital transformation priorities for the Attica region.
Stack: Supabase · React
GitHub Private repo — available on requestPython · R · SQL · DAX · JavaScript
Pandas · Power BI · Plotly · ggplot2 · Streamlit · Shiny
Django · HTMX · Tailwind CSS · MongoDB · SQLite · PostgreSQL
LLM integration · RAG pipelines · scikit-learn · Prompt engineering
Docker · Google Cloud Run · Azure · GitHub Actions · WhiteNoise
Statistical disclosure · Fisheries science · EU policy analytics · Digital transformation
Advanced Data Analytics Specialization (2025)
Business Intelligence Specialization (2025)
Azure Machine Learning for Data Scientists (2025)
Power BI & Power Virtual Agents (2025)
Data Visualization & Reporting with Generative AI (2025)
Geospatial Analysis with ArcGIS (2025)
Quantitative Ecology & Statistical Modeling — University of Copenhagen / DTU-Aqua, Denmark (2006–2010). Marie Curie Fellowship. Meta-analysis and hierarchical modeling of population dynamics.
Ecology & Environmental Management — University of Patras, Greece (2003–2006)
Biology — University of Patras, Greece (1998–2003)