Case Study — Personal project
Fundibot: Making South African Higher Education Accessible
A free, no-login platform helping matriculants find courses, check APS requirements, calculate points scores, and plan their careers. Built from scraped prospectuses, enriched JSON datasets, and a determination to give every South African learner the information they deserve.
Built by Ayabonga Qwabi · Qwabi Engineering · 2025 - ongoing
The problem
Grade 12 ends. Then what?
- University prospectuses are 200-page PDFs scattered across 26 different institutional websites
- APS, FPS, WPS, and UWC Points are different systems — no single tool explains all of them
- Thousands of Eastern Cape learners have no guidance counsellor and no data plan to research options
The solution
One platform. Free. No login.
Fundibot aggregates courses, admission requirements, fee estimates, bursaries, and career paths from across South Africa. A learner in Komani with a R5 data bundle can check what they qualify for, see what they could earn, and find bursaries that cover their field — all in one place.
Current coverage
Eastern Cape institutions are well covered — University of Fort Hare, Walter Sisulu University, and multiple TVET colleges. Western Cape expansion (UCT, Stellenbosch, CPUT) is in progress. 23+ prospectuses processed. 75+ institutions targeted nationwide.
Technical approach
Scraping 200-page PDFs and making sense of them.
South African institutions publish data in every format imaginable — scanned PDFs, Word documents exported to web, HTML tables with no structure. The enrichment pipeline had to handle all of it.
PDF scraping at scale
23+ university and TVET prospectuses processed, most as multi-hundred-page PDFs. Used a combination of pdfplumber, Tesseract OCR for scanned pages, and Claude AI for extraction and structuring where text layers were absent or corrupt.
Admission system fragmentation
SA institutions use at least 4 different points systems: standard NSC APS, UCT Faculty Points Score (FPS) and Weighted Points Score (WPS), and UWC's own Code-based points table. Each required a separate detection and calculation model.
Enrichment pipeline (enrich_pipeline.py)
A smart Python pipeline that detects: South African currency (R followed by digits, with thousand separators), admission requirement strings in formats like "Mathematics 4", "70% for English", "Engl HL/FAL 4 and Maths 4 and at least three of the designated 20 credit subjects", and qualification types including NCV Level 2/3/4, NATED N4–N6, Degrees, Diplomas, and Advanced Diplomas.
TVET default logic
TVET colleges follow DHET defaults that no prospectus states explicitly. NCV Level 2 requires Grade 9. NCV Level 3 requires NCV Level 2. NCV Level 4 requires NCV Level 3. NATED N4–N6 requires Matric. The pipeline infers these where no requirement is stated and flags inferred entries for review.
Bursary scraping
976 unique bursaries with direct URLs scraped from zabursaries.co.za using Claude in Chrome browser automation. The site blocks Python requests — browser-level scraping was the only path. Data is structured by field of study category and subcategory.
Structured JSON output
All scraped and enriched data compiles to typed JSON datasets consumed by the frontend tools. Each dataset has a metadata block with source, scrape date, confidence indicators, and inferred-flag counts so the UI can surface data quality honestly.
What was built
Seven tools. One platform.
01
Course and Institution Finder
Enter a course name or institution. Filter by province, qualification type, and APS score. Returns matching programmes with admission requirements, fee estimates, and institution contact details.
02
University Qualification Checker
Student enters their subjects and marks. The tool calculates APS, UCT FPS/WPS, and UWC points in parallel, then shows which programmes at which institutions they qualify for — with band indicators for guaranteed, probable, and possible admission.
03
What Could You Earn?
Subjects and marks map to career fields, then to specific occupations, then to real ZAR salary ranges from Paylab SA survey data. 154 occupations across 26 sectors. First result is a hero card with a counting salary animation.
04
Predict My Future
A personalised timeline from current grade to marriage milestone. Pulls occupation from salary tool, institution from qualification checker, and employer from the SA companies dataset. Random positive and character-building events inserted for realism and fun.
05
Bursary and Funding Matcher
976 bursaries with direct URLs to zabursaries.co.za. Student filters by field of study and province. NSFAS eligibility check included. Government bursaries filtered by province match.
06
Study Cost and Fee Calculator
Estimates total cost to graduation including tuition, accommodation, and living costs. NSFAS coverage estimator. Year-by-year projector with 8% annual fee increase default. PDF export of cost summary.
07
Institution Stats Dashboard
Visualisations of the full dataset: programmes by institution, by province, by qualification type, by admission data coverage. Attribution block crediting all scraped sources.
Data transparency
The data on Fundibot is compiled from publicly available prospectuses, university websites, zabursaries.co.za, nationalgovernment.co.za, Wikimedia, and other open sources. Each entry carries a source tag and, where data was inferred rather than scraped, an 'Inferred' flag so users know to verify. We strive for accuracy and update as institutions publish new information. Always verify admission requirements, fees, and closing dates directly with the institution before making any study decisions.
What this project taught
Lessons from building at the edge of public data.
01
SA education data is fragmented by design, or neglect.
No two institutions publish admission data in the same format. Some APS tables are images. Some fee schedules are 4-year-old PDFs. Building Fundibot exposed how much friction sits between learners and the information they need to make life decisions.
02
PDF scraping at scale requires fallbacks at every step.
pdfplumber handles text PDFs well. Scanned PDFs need OCR. AI extraction handles the edge cases OCR misses. No single tool covers everything — the pipeline has to degrade gracefully and flag failures rather than silently drop data.
03
Fun tools are not shallow tools.
The Predict My Future timeline and salary estimator are the most engaged-with parts of the platform. Students do not want a database — they want to see themselves in the future. The emotional layer is the product.
04
Saying 'work in progress' builds more trust than hiding it.
Every data point on Fundibot that was inferred rather than explicitly scraped is flagged. Users can see the coverage and the gaps. This transparency has generated more positive feedback than any polished-but-silent system would have.
What comes next
Fundibot is not finished.
- Complete Western Cape coverage — UCT, Stellenbosch, CPUT, UWC, and Western Cape TVET colleges fully enriched with programme, admission, and fee data
- All 9 provinces — Every public university and TVET college in South Africa represented with at minimum programme names and NQF levels
- Private colleges — Boston City Campus, Damelin, IMM, Rosebank College, and others added with distinct institution type labelling
- Application deadline tracker — Real opening and closing dates per institution, with status indicators and optional reminder links
- Study group and peer matching — Optional lightweight accounts for learners to save their profiles, share timelines, and connect with others in the same field
- Improved pricing data — Per-module fee breakdowns where available, not just institutional averages
Want a system like this?
Every business problem has a system waiting to be built.
Fundibot is a personal project. But the same scraping, enrichment, and data pipeline thinking applies to any domain with messy, fragmented public or internal data. If you are sitting on a data problem, let's talk.