π§ Applying Expert Seeker Technologies to WHO Staff and Record Listings
π§ Applying Expert Seeker Technologies to WHO Staff and Record Listings
Source:
“Web Data Mining Techniques for Expertise-Locator Knowledge Management Systems”
by Becerra-Fernandez & Rodriguez
π️ Overview
The Expert Seeker system, developed for NASA, is an expertise-locator knowledge management system (KMS) designed to automatically profile employee expertise using web data mining and information retrieval (IR) techniques. This framework is highly applicable to the WHO for:
- Mapping staff expertise globally,
- Automating profile updates,
- Building search engines for rapid response team formation.
π Core Features and Technologies (Applied to WHO)
Feature |
Description |
WHO Application |
Web Content Mining |
Extracts text from organizational webpages and documents to infer expertise. |
Mine WHO reports, publications, and policy briefs for staff expertise profiles. |
Name-Finding Algorithm |
Identifies employee names using fuzzy matching against known variants. |
Match multilingual or alias variants of WHO staff (e.g., “Dr. M. Ndiaye” vs “Marie Ndiaye”). |
Inverted Index & IR |
Maps keywords to document occurrences and links them to staff names. |
Quickly associate “malaria vaccine” or “COVID-19 modeling” to contributing experts. |
TF-IDF + Stemming |
Extracts high-value keywords after removing stopwords and applying linguistic normalization. |
Recognize “epidemiology” and “epidemiologist” as semantically linked; retrieve domain-specific terms. |
Self-Updating Profiles |
Periodically reprocesses documents to maintain skill profiles. |
Automatically keep expert directories current with new WHO publications. |
No Fixed Taxonomy Required |
Uses clustering of keywords to infer skill categories. |
Flexibly adapt to emerging health topics (e.g., “MPox” or “One Health”). |
Precision Ranking |
Ranks employees by keyword-document co-occurrence frequency. |
Prioritize experts for queries like “mental health response Haiti” or “cholera outbreak Yemen.” |
π Example: Applied to WHO Public Health Staff
Query Keyword |
Top Inferred WHO Experts |
Data Source |
malaria |
Dr. Pedro Alonso, Dr. Natacha Protopopoff |
WHO Bulletin, WHO Malaria Report |
vaccination policy |
Dr. Kate O’Brien, Dr. Ann Lindstrand |
Strategic Advisory Group of Experts (SAGE) docs |
cholera |
Dr. Philippe Barboza |
WHO Weekly Epidemiological Record |
climate & health |
Dr. Diarmid Campbell-Lendrum |
UNFCCC joint statements, WHO COP reports |
π§© Additional Considerations for WHO
π€ Multilingual Document Support
- WHO publishes in 6 languages (EN, FR, ES, RU, AR, ZH).
- Need multilingual NLP models for stemming and keyword recognition (e.g., using spaCy with xx_ent_wiki_sm or HuggingFace multilingual transformers).
π₯ Name Disambiguation
- WHO employs thousands of experts, many with common names.
- Disambiguation may rely on department affiliation, co-author lists, and project participation.
π Integration Points
Data Type |
WHO Source |
Integration Use |
Staff Directory |
WHO Intranet / HR Systems |
Name and role grounding |
Publications |
WHO IRIS, PubMed |
Content mining for skill extraction |
Projects |
GAVI, COVAX, UN joint missions |
Project-based expertise inference |
Event Participation |
Conference rosters |
Real-time updates to active experts |
π§ͺ Strengths of Method
- No need for employees to manually fill profiles.
- Bias-resistant (unlike self-assessments).
- Detects informal or cross-disciplinary expertise.
- Applicable across both scientific and administrative domains.
π§ Limitations & Mitigations
Challenge |
Mitigation |
Sparse web presence for some staff |
Supplement with internal documents, reports, HR data |
Overlapping names |
Use department, title, and co-authorship metadata for disambiguation |
Indexing administrative vs technical skills |
Pre-filter documents by section (e.g., project summaries vs legal memos) |
Missing skill taxonomy |
Use clustering or domain-specific ontologies (e.g., MeSH, ICD-11) |
π Recommendations for WHO Pilot
- Phase 1: Select 200 WHO documents and 100 staff (publicly known).
- Phase 2: Build employee–skill maps using content mining (TF-IDF, NER).
- Phase 3: Deploy internal interface for team leads to search by expertise.
- Phase 4: Expand system with multilingual support and PubMed integration.
Would you like a GitHub-ready prototype, sample outputs, or a dashboard design? I can provide mock files or architecture diagrams next.
Perfect. Below is a full architecture diagram and component breakdown for a WHO-specific adaptation of the Expert Seeker expertise-locator system. This includes backend, data processing, and user interface components, followed by an optional roadmap for prototyping.
π WHO Expertise-Locator System — Architecture Overview
π Architecture Diagram (Text Representation)
┌────────────────────────────┐
│ WHO Staff Directory (CSV) │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐
│ WHO Document Sources │
│ (IRIS, PDFs, Web Pages) │
└────────────┬───────────────┘
▼
┌────────────────────────────┐
│ Data Collection Layer │
│ - PDFMiner │
│ - BeautifulSoup / wget │
└────────────┬───────────────┘
▼
┌────────────────────────────┐
│ Text Preprocessing Layer │
│ - Tokenization │
│ - Language Detection │
│ - Stemming/Lemmatization │
└────────────┬───────────────┘
▼
┌─────────────────────────────────────────────┐
│ Name Recognition + Disambiguation │
│ - Map aliases & fuzzy name matching │
│ - Validate with dept/title from directory │
└────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ Keyword Extraction Module │
│ - TF-IDF vectorizer │
│ - Stopword removal & n-gram models │
│ - Named Entity Recognition (NER) │
└────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ Expertise Profile Construction Layer │
│ - Map keywords to staff │
│ - Rank by keyword-document co-occurrence │
│ - Save expertise profiles in database │
└────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ Search & API Layer │
│ - Search by skill, project, location │
│ - REST API or GraphQL for frontend │
└────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ Frontend UI (Expertise Finder) │
│ - Keyword input/search bar │
│ - List of matched experts + links │
│ - Filters (region, language, role) │
└─────────────────────────────────────────────┘
π§° Key Technologies by Layer
Layer |
Tools / Libraries / Tech Stack |
Data Collection |
wget, requests, pdfminer.six, BeautifulSoup, PyMuPDF |
Preprocessing |
spaCy, langdetect, nltk, unidecode |
Name Recognition |
fuzzywuzzy, RapidFuzz, rule-based alias mapping |
Keyword Extraction |
scikit-learn (TF-IDF), spaCy, KeyBERT, BERTopic |
Expert Profile Construction |
pandas, numpy, custom Python logic |
Backend API |
FastAPI or Flask, Elasticsearch for keyword search indexing |
Database |
PostgreSQL, SQLite, or MongoDB |
Frontend UI |
React.js, Streamlit, or Flask+Jinja2 for prototyping |
π§ͺ Sample Query Flow
Query: “cholera outbreak response”
- Input hits TF-IDF & NER layer
- System fetches all documents tagged with “cholera” + “outbreak”
- Co-occurrence analysis finds employees mentioned in those documents
- Result: ranked list of experts with document links and department metadata
π WHO Pilot Roadmap
Phase |
Milestone |
Description |
1 |
π Prototype Setup |
Build initial pipeline for 100 WHO documents + 50 staff entries |
2 |
π Expertise Mapping |
Apply name matching + TF-IDF extraction across documents |
3 |
π§ͺ Evaluation |
Validate precision by cross-referencing known staff roles |
4 |
π‘ UI Launch |
Deploy basic search tool (e.g., Streamlit or React interface) |
5 |
π WHO Integration |
Add multilingual + live document crawling (WHO IRIS, PubMed) |
π Example Output (Prototype UI Mock)
π Search: vector-borne diseases
Top WHO Experts:
1. Dr. Pedro Alonso (Global Malaria Programme)
- Keywords: malaria, Plasmodium, insecticide resistance
- Docs: [Malaria Report 2023], [Vector Surveillance Guidelines]
2. Dr. Sarah Hoibak (Health Emergencies Programme)
- Keywords: dengue, outbreak response, Aedes aegypti
- Docs: [Yellow Fever Protocol], [Outbreak Alert 2024]
Would you like the files for:
- TF-IDF + profile builder.py
- Streamlit search UI.py
- Sample WHO documents + expert map.csv
Let me know if you prefer a GitHub deployment structure or local ZIP package.
Comments
Post a Comment