🧠 Applying Expert Seeker Technologies to WHO Staff and Record Listings

🧠 Applying Expert Seeker Technologies to WHO Staff and Record Listings




Source: 

“Web Data Mining Techniques for Expertise-Locator Knowledge Management Systems”

 by Becerra-Fernandez & Rodriguez






πŸ—‚️ Overview



The Expert Seeker system, developed for NASA, is an expertise-locator knowledge management system (KMS) designed to automatically profile employee expertise using web data mining and information retrieval (IR) techniques. This framework is highly applicable to the WHO for:


  • Mapping staff expertise globally,
  • Automating profile updates,
  • Building search engines for rapid response team formation.






πŸ“Œ Core Features and Technologies (Applied to WHO)


Feature

Description

WHO Application

Web Content Mining

Extracts text from organizational webpages and documents to infer expertise.

Mine WHO reports, publications, and policy briefs for staff expertise profiles.

Name-Finding Algorithm

Identifies employee names using fuzzy matching against known variants.

Match multilingual or alias variants of WHO staff (e.g., “Dr. M. Ndiaye” vs “Marie Ndiaye”).

Inverted Index & IR

Maps keywords to document occurrences and links them to staff names.

Quickly associate “malaria vaccine” or “COVID-19 modeling” to contributing experts.

TF-IDF + Stemming

Extracts high-value keywords after removing stopwords and applying linguistic normalization.

Recognize “epidemiology” and “epidemiologist” as semantically linked; retrieve domain-specific terms.

Self-Updating Profiles

Periodically reprocesses documents to maintain skill profiles.

Automatically keep expert directories current with new WHO publications.

No Fixed Taxonomy Required

Uses clustering of keywords to infer skill categories.

Flexibly adapt to emerging health topics (e.g., “MPox” or “One Health”).

Precision Ranking

Ranks employees by keyword-document co-occurrence frequency.

Prioritize experts for queries like “mental health response Haiti” or “cholera outbreak Yemen.”





πŸ” Example: Applied to WHO Public Health Staff


Query Keyword

Top Inferred WHO Experts

Data Source

malaria

Dr. Pedro Alonso, Dr. Natacha Protopopoff

WHO Bulletin, WHO Malaria Report

vaccination policy

Dr. Kate O’Brien, Dr. Ann Lindstrand

Strategic Advisory Group of Experts (SAGE) docs

cholera

Dr. Philippe Barboza

WHO Weekly Epidemiological Record

climate & health

Dr. Diarmid Campbell-Lendrum

UNFCCC joint statements, WHO COP reports





🧩 Additional Considerations for WHO




πŸ”€ Multilingual Document Support



  • WHO publishes in 6 languages (EN, FR, ES, RU, AR, ZH).
  • Need multilingual NLP models for stemming and keyword recognition (e.g., using spaCy with xx_ent_wiki_sm or HuggingFace multilingual transformers).




πŸ‘₯ Name Disambiguation



  • WHO employs thousands of experts, many with common names.
  • Disambiguation may rely on department affiliation, co-author lists, and project participation.




πŸ“Š Integration Points


Data Type

WHO Source

Integration Use

Staff Directory

WHO Intranet / HR Systems

Name and role grounding

Publications

WHO IRIS, PubMed

Content mining for skill extraction

Projects

GAVI, COVAX, UN joint missions

Project-based expertise inference

Event Participation

Conference rosters

Real-time updates to active experts





πŸ§ͺ Strengths of Method



  • No need for employees to manually fill profiles.
  • Bias-resistant (unlike self-assessments).
  • Detects informal or cross-disciplinary expertise.
  • Applicable across both scientific and administrative domains.






🚧 Limitations & Mitigations


Challenge

Mitigation

Sparse web presence for some staff

Supplement with internal documents, reports, HR data

Overlapping names

Use department, title, and co-authorship metadata for disambiguation

Indexing administrative vs technical skills

Pre-filter documents by section (e.g., project summaries vs legal memos)

Missing skill taxonomy

Use clustering or domain-specific ontologies (e.g., MeSH, ICD-11)





πŸš€ Recommendations for WHO Pilot



  1. Phase 1: Select 200 WHO documents and 100 staff (publicly known).
  2. Phase 2: Build employee–skill maps using content mining (TF-IDF, NER).
  3. Phase 3: Deploy internal interface for team leads to search by expertise.
  4. Phase 4: Expand system with multilingual support and PubMed integration.





Would you like a GitHub-ready prototype, sample outputs, or a dashboard design? I can provide mock files or architecture diagrams next.

Perfect. Below is a full architecture diagram and component breakdown for a WHO-specific adaptation of the Expert Seeker expertise-locator system. This includes backend, data processing, and user interface components, followed by an optional roadmap for prototyping.





πŸ— WHO Expertise-Locator System — Architecture Overview






πŸ“Š Architecture Diagram (Text Representation)


                          ┌────────────────────────────┐

                          │  WHO Staff Directory (CSV) │

                          └────────────┬───────────────┘

                                       │

                                       ▼

                          ┌────────────────────────────┐

                          │  WHO Document Sources       │

                          │  (IRIS, PDFs, Web Pages)    │

                          └────────────┬───────────────┘

                                       ▼

                          ┌────────────────────────────┐

                          │  Data Collection Layer      │

                          │  - PDFMiner                 │

                          │  - BeautifulSoup / wget     │

                          └────────────┬───────────────┘

                                       ▼

                          ┌────────────────────────────┐

                          │  Text Preprocessing Layer   │

                          │  - Tokenization             │

                          │  - Language Detection       │

                          │  - Stemming/Lemmatization   │

                          └────────────┬───────────────┘

                                       ▼

             ┌─────────────────────────────────────────────┐

             │        Name Recognition + Disambiguation     │

             │  - Map aliases & fuzzy name matching         │

             │  - Validate with dept/title from directory   │

             └────────────┬────────────────────────────────┘

                          ▼

             ┌─────────────────────────────────────────────┐

             │           Keyword Extraction Module          │

             │  - TF-IDF vectorizer                         │

             │  - Stopword removal & n-gram models          │

             │  - Named Entity Recognition (NER)            │

             └────────────┬────────────────────────────────┘

                          ▼

             ┌─────────────────────────────────────────────┐

             │      Expertise Profile Construction Layer    │

             │  - Map keywords to staff                     │

             │  - Rank by keyword-document co-occurrence    │

             │  - Save expertise profiles in database        │

             └────────────┬────────────────────────────────┘

                          ▼

             ┌─────────────────────────────────────────────┐

             │         Search & API Layer                   │

             │  - Search by skill, project, location        │

             │  - REST API or GraphQL for frontend          │

             └────────────┬────────────────────────────────┘

                          ▼

             ┌─────────────────────────────────────────────┐

             │         Frontend UI (Expertise Finder)       │

             │  - Keyword input/search bar                  │

             │  - List of matched experts + links           │

             │  - Filters (region, language, role)          │

             └─────────────────────────────────────────────┘





🧰 Key Technologies by Layer


Layer

Tools / Libraries / Tech Stack

Data Collection

wget, requests, pdfminer.six, BeautifulSoup, PyMuPDF

Preprocessing

spaCy, langdetect, nltk, unidecode

Name Recognition

fuzzywuzzy, RapidFuzz, rule-based alias mapping

Keyword Extraction

scikit-learn (TF-IDF), spaCy, KeyBERT, BERTopic

Expert Profile Construction

pandas, numpy, custom Python logic

Backend API

FastAPI or Flask, Elasticsearch for keyword search indexing

Database

PostgreSQL, SQLite, or MongoDB

Frontend UI

React.js, Streamlit, or Flask+Jinja2 for prototyping





πŸ§ͺ Sample Query Flow



Query: “cholera outbreak response”



  1. Input hits TF-IDF & NER layer
  2. System fetches all documents tagged with “cholera” + “outbreak”
  3. Co-occurrence analysis finds employees mentioned in those documents
  4. Result: ranked list of experts with document links and department metadata






πŸ“… WHO Pilot Roadmap


Phase

Milestone

Description

1

πŸ— Prototype Setup

Build initial pipeline for 100 WHO documents + 50 staff entries

2

πŸ” Expertise Mapping

Apply name matching + TF-IDF extraction across documents

3

πŸ§ͺ Evaluation

Validate precision by cross-referencing known staff roles

4

πŸ’‘ UI Launch

Deploy basic search tool (e.g., Streamlit or React interface)

5

🌐 WHO Integration

Add multilingual + live document crawling (WHO IRIS, PubMed)





πŸ“Œ Example Output (Prototype UI Mock)



πŸ”Ž Search: vector-borne diseases

Top WHO Experts:

1. Dr. Pedro Alonso (Global Malaria Programme)

   - Keywords: malaria, Plasmodium, insecticide resistance

   - Docs: [Malaria Report 2023], [Vector Surveillance Guidelines]

2. Dr. Sarah Hoibak (Health Emergencies Programme)

   - Keywords: dengue, outbreak response, Aedes aegypti

   - Docs: [Yellow Fever Protocol], [Outbreak Alert 2024]




Would you like the files for:


  • TF-IDF + profile builder.py
  • Streamlit search UI.py
  • Sample WHO documents + expert map.csv



Let me know if you prefer a GitHub deployment structure or local ZIP package.



Comments

Popular posts from this blog

Low Volume Tech Jargon Classification Scheme

Dead Drop Zone Alcatraz Allegheny

Wideangle Duckfuck Network Protocols Operative