Beatrust Scout's Matching Algorithm
Hi! I am Yongtae Hwang , working at Beatrust as an AI engineer. We have just released a Beatrust Scout which is a search system to discover talent within a company (Figure 1). You can see our press release for more use cases.
Since I created and developed the search engine for this Scout, I would like to introduce the algorithm behind Beatrust Scout in this blog.
🎉 We got Japanese patent for this algorithm (特許7574522). 🎉
Additionally, I would like to emphasize that after completing the algorithm development, implementing it into the product was an incredible experience. Working with talented dev members to write scalable and sustainable code was invaluable, and particularly implementing the logic in Golang has truly contributed to my growth as an engineer.
If you're interested in the application perspective of Scout, you can check out our technical blog here !
Introduction
As organizations grow, it becomes increasingly difficult to keep track of internal talent. Finding the right internal talent is particularly challenging in the following scenarios:
Typical Scenarios for Internal Talent Utilization
Project Launch and Expansion
Seeking advice on technical stack selection for new projects
Looking for experienced advisors from similar projects
Finding team members with complementary skills
Technical Problem Solving
Consulting with performance optimization experts
Searching for internal security review specialists
Finding engineers experienced in legacy system renovation
Recruitment and Development Activities
Looking for interviewers with specific technical stack expertise
Finding mentors for junior engineers
Searching for internal study session instructors
However, traditional internal talent search methods (keyword and category-based searches) have struggled to meet these specific needs due to the following limitations:
Limitations of Traditional Internal Talent Search
The Exact Match Constraint
Only finds talent with exact keyword matches
Recruiters need to know precise keywords before search
Employee may be limited to using predetermined keywords
Flat Search Results
Search results displayed as simple unranked lists
Generic keywords like "engineer" or "sales" make it difficult to find appropriate talent
Beatrust Scout's Matching Search
As mentioned above, traditional search methods struggle when users register free-form text.
On the other hand, Beatrust provides an internal SNS-like feature called Beatrust People. In Beatrust People, users have the profile page and their Tag and Project History (Figure 2), which shows the personality of the user. Tags and project Histories can contain free-form text, allowing for flexible self-expression.
Using advanced natural language processing technology described below, Beatrust Scout can effectively match various search queries with talent profiles and display them in a ranked format (Figure 1).
Benefits of Beatrust Scout
Advanced Free-Text Matching Capabilities
Scout's matching engine can effectively process and understand various forms of tag expressions, enabling accurate matching even with:
Complex technical descriptions like "Skilled in cloud-native architecture design"
Detailed experience descriptions like "Experience in 0-to-1 product launch at startups"
Multi-faceted Talent Search
Enables comprehensive talent discovery beyond technical skills:
Technical search: "Experience building microservices with Kubernetes"
Soft Skills search: "Proficient in remote team management"
Interests & Hobbies search: "Active in knowledge sharing with technical writing experience"
Flexible Search Methods
Accommodates various search query inputs
Specific Job Requirements
Short descriptions like "Development leader experience in fintech products at financial startups"
Direct utilization of existing job requirement documents
Casual Consultation Matching
"Someone who can provide career advice to engineers"
"Engineers balancing work with child-rearing"
Project-Based Matching
Automatic matching of optimal talent based on project overview
Example: Finding engineers with relevant tech stack and experience from "AI-powered image recognition system development project" overview
Intelligent Ranking Display
Ranking is based on search results, not list of users. For example, when searching for "AI engineers with product development experience":
Engineers with AI product development leadership and extensive ML model implementation experience
Engineers with ML model consulting experience who partially contributed to AI product development
Engineers with ML knowledge and implementation experience but no AI product development experience Results are ranked according to relevance
This enables users to efficiently find optimal talent while allowing individuals to effectively showcase their experience and strengths.
Technical Overview
Challenges in Embedding-Based Matching
In AI-powered matching between search queries and profile information, the most basic approach is to use embedding technology. However, this method faces several fundamental challenges:
Challenges Due to Different Text Characteristics
Tags: Typically concise words or short phrases.
Project History: Fact-based descriptions of job duties and roles
Search Queries: Free-form text with varying expressions and lengths
Simply projecting these different types of text into the same embedding space may not appropriately represent the intended semantic similarities. This can result in unexpected matching results.
While this is a common challenge in embedding-based approaches, and various solutions like task-type specific embedding methods exist, we decided to develop and implement our own skill-based matching logic.
Skill-Based Matching
To address the challenges posed by different text characteristics, we developed our own skill-based matching system inspired by the Hyde method.
Hyde provides an effective approach to a crucial challenge in question-answering systems: the mismatch between question and answer characteristics. While traditional methods directly compare questions with answers, Hyde first uses a Language Model to generate hypothetical answers from questions. By embedding these hypothetical answers and pre-prepared actual answers into embedding space for comparison, it achieves more accurate matching. This enables a more precise capture of question intent and better answer identification.
We evolved Hyde's concept further to establish a practical approach specialized for talent matching in Beatrust. While conventional methods directly embedded search queries and Tags/Project Histories to calculate similarity (Figure 3 above), our new approach breaks down each text into specific skill elements. By matching these skill elements, we've succeeded in providing more accurate and reliable matching results (Figure 3 bottom).
Decomposing profile(Tags/Project History) into Skills
How should we decompose a tag like "From company establishment to IPO" into skills? Simple keyword matching might extract skills like "company", "establishment", and "IPO", but the meaning of each skill would be unclear and difficult to use for searching.
Our approach uses LLM to decompose text into skills hidden behind Tags/Project Histories. Additionally, instead of treating all generated skills equally, we also get a score representing how closely related each skill is to the Tag/Project History (see the following tables of examples). This score helps achieve more convincing matching results.
Using LLM for skill extraction yielded more appropriate skills than we initially expected. Here are some examples (Figure 4).
Scoring
Beatrust Scout's scoring system goes beyond traditional self-declared skills by incorporating external validations and real-world experience. This multi-dimensional approach ensures more reliable and objective talent matching.
Tag Score Components
The tag score combines several validation factors:
Hybrid Similarity: Not only cosine similarity, we also use keyword matching logic to calculate similarity score.
Relevance: Relevance of skills and tags
Peer Endorsements: If tags are given by colleagues, the score is higher
+1 Endorsements: If a colleague confirms the tag by adding a "+1" endorsement to it, the score is higher
Project History Score Components
The Project History score is based on:
Hybrid Similarity: Not only cosine similarity, we also use keyword matching logic to calculate similarity score.
Relevance: Relevance of skills and projects.
Duration of Involvement: Longer project engagement indicates deeper expertise
Project Recency: Recent experiences are weighted more heavily
The weighted sum of these individual scores ultimately determines the final score for each input skill and user, which is displayed in the results (Figure 5).
This approach offers several key advantages:
Reduced Self-Reporting Bias: External validations provide objective skill assessment
Real-World Expertise Verification: Project History confirms practical application
Dynamic Skill Evolution: Recent experiences reflect current capabilities
Community-Driven Validation: Peer endorsements strengthen credibility
Possible extension: Visualization Possible Through Combination with Hierarchical Data
※ This visualization feature is not implemented now (2024/11/19).
After decomposing profile items into skills, we can visualize individual/team skill hierarchies by combining this skill information with existing structured skill datasets (e.g., ESCO (European Skills, Competences, Qualifications and Occupations)). Figure 6 shows the example of a hierarchical structure dataset which is related to programming-related skills.
Assuming similar skill names between extracted skills from Tags/Project Histories and dataset skills, we can link each Tag/Project History with existing skill datasets through embedding-based matching, enabling visualization of an organization's or individual's skills. (The example of visualization of skills in Beatrust is described in Figure 7)
Conclusion
In this blog post, we introduced the abstract logic behind Beatrust Scout, our new talent finding system. Beatrust Scout is expected to be a major breakthrough in talent search technology, overcoming the limitations of traditional search methods.
Here are the key highlights:
Advanced Skill Extraction Technology
Sophisticated skill decomposition using LLM
Context-aware weighted scoring
Ability to interpret technical abbreviations and even emojis
Reliable Matching System
Integration of external validations
Community-based verification system
Utilization of Project History
Flexible Search Capabilities
Free-text search functionality
Multi-dimensional talent discovery
Intelligent ranking display
Organizations implementing this system can benefit from:
More effective utilization of internal talent
More accurate talent matching
Efficient talent allocation and development
We Are Hiring!!
We are hiring those who are interested in working with many talented members in a bilingual environment!
https://en.corp.beatrust.com/careers