Beatrust Scout's Matching Algorithm

2024年11月26日 11:25

Hi! I am Yongtae Hwang , working at Beatrust as an AI engineer. We have just released a Beatrust Scout which is a search system to discover talent within a company (Figure 1). You can see our press release for more use cases.

Figure 1. The result image of Beatrust Scout. Users can search Beatrust users with various text types.

Since I created and developed the search engine for this Scout, I would like to introduce the algorithm behind Beatrust Scout in this blog.
🎉 We got Japanese patent for this algorithm (特許7574522). 🎉

Additionally, I would like to emphasize that after completing the algorithm development, implementing it into the product was an incredible experience. Working with talented dev members to write scalable and sustainable code was invaluable, and particularly implementing the logic in Golang has truly contributed to my growth as an engineer.
If you're interested in the application perspective of Scout, you can check out our technical blog here !

Introduction

As organizations grow, it becomes increasingly difficult to keep track of internal talent. Finding the right internal talent is particularly challenging in the following scenarios:

Typical Scenarios for Internal Talent Utilization

Project Launch and Expansion
1. Seeking advice on technical stack selection for new projects
2. Looking for experienced advisors from similar projects
3. Finding team members with complementary skills
Technical Problem Solving
- Consulting with performance optimization experts
- Searching for internal security review specialists
- Finding engineers experienced in legacy system renovation
Recruitment and Development Activities
- Looking for interviewers with specific technical stack expertise
- Finding mentors for junior engineers
- Searching for internal study session instructors

However, traditional internal talent search methods (keyword and category-based searches) have struggled to meet these specific needs due to the following limitations:

Limitations of Traditional Internal Talent Search

The Exact Match Constraint
1. Only finds talent with exact keyword matches
2. Recruiters need to know precise keywords before search
3. Employee may be limited to using predetermined keywords
Flat Search Results
1. Search results displayed as simple unranked lists
2. Generic keywords like "engineer" or "sales" make it difficult to find appropriate talent

Beatrust Scout's Matching Search

As mentioned above, traditional search methods struggle when users register free-form text.
On the other hand, Beatrust provides an internal SNS-like feature called Beatrust People. In Beatrust People, users have the profile page and their Tag and Project History (Figure 2), which shows the personality of the user. Tags and project Histories can contain free-form text, allowing for flexible self-expression.

Figure 2. The profile page of the author in Beatrust People. The light blue and yellow square in the middle of the image stands for the “Tags”, which shows the personality of the user. The light blue Tag means that it was created by its own user, and a yellow Tag means that it was received from someone else. The number in the upper right corner of each Tag means +1 endorsement from others. The bottom of the image shows the Project History that the author experienced.

Using advanced natural language processing technology described below, Beatrust Scout can effectively match various search queries with talent profiles and display them in a ranked format (Figure 1).

Benefits of Beatrust Scout

Advanced Free-Text Matching Capabilities
- Scout's matching engine can effectively process and understand various forms of tag expressions, enabling accurate matching even with:
  - Complex technical descriptions like "Skilled in cloud-native architecture design"
  - Detailed experience descriptions like "Experience in 0-to-1 product launch at startups"
Multi-faceted Talent Search
- Enables comprehensive talent discovery beyond technical skills:
  - Technical search: "Experience building microservices with Kubernetes"
  - Soft Skills search: "Proficient in remote team management"
  - Interests & Hobbies search: "Active in knowledge sharing with technical writing experience"
Flexible Search Methods
- Accommodates various search query inputs
  - Specific Job Requirements
    - Short descriptions like "Development leader experience in fintech products at financial startups"
    - Direct utilization of existing job requirement documents
  - Casual Consultation Matching
    - "Someone who can provide career advice to engineers"
    - "Engineers balancing work with child-rearing"
  - Project-Based Matching
    - Automatic matching of optimal talent based on project overview
    - Example: Finding engineers with relevant tech stack and experience from "AI-powered image recognition system development project" overview
Intelligent Ranking Display
- Ranking is based on search results, not list of users. For example, when searching for "AI engineers with product development experience":
  - Engineers with AI product development leadership and extensive ML model implementation experience
  - Engineers with ML model consulting experience who partially contributed to AI product development
  - Engineers with ML knowledge and implementation experience but no AI product development experience Results are ranked according to relevance

This enables users to efficiently find optimal talent while allowing individuals to effectively showcase their experience and strengths.

Technical Overview

Challenges in Embedding-Based Matching

In AI-powered matching between search queries and profile information, the most basic approach is to use embedding technology. However, this method faces several fundamental challenges:

Challenges Due to Different Text Characteristics
- Tags: Typically concise words or short phrases.
- Project History: Fact-based descriptions of job duties and roles
- Search Queries: Free-form text with varying expressions and lengths

Simply projecting these different types of text into the same embedding space may not appropriately represent the intended semantic similarities. This can result in unexpected matching results.
While this is a common challenge in embedding-based approaches, and various solutions like task-type specific embedding methods exist, we decided to develop and implement our own skill-based matching logic.

Skill-Based Matching

To address the challenges posed by different text characteristics, we developed our own skill-based matching system inspired by the Hyde method.
Hyde provides an effective approach to a crucial challenge in question-answering systems: the mismatch between question and answer characteristics. While traditional methods directly compare questions with answers, Hyde first uses a Language Model to generate hypothetical answers from questions. By embedding these hypothetical answers and pre-prepared actual answers into embedding space for comparison, it achieves more accurate matching. This enables a more precise capture of question intent and better answer identification.
We evolved Hyde's concept further to establish a practical approach specialized for talent matching in Beatrust. While conventional methods directly embedded search queries and Tags/Project Histories to calculate similarity (Figure 3 above), our new approach breaks down each text into specific skill elements. By matching these skill elements, we've succeeded in providing more accurate and reliable matching results (Figure 3 bottom).

Figure 3. Schematic figure of the algorithm of Scout. The upper figure indicates a direct matching of user information and query. Since the connection of direct matching between user information and query is weak, it is hard to make reasonable results. Instead of this approach, we generate skills as a connector and we use it for proper matching (in the red area), where yellow squares mean the skill names and relevance.

Decomposing profile(Tags/Project History) into Skills

How should we decompose a tag like "From company establishment to IPO" into skills? Simple keyword matching might extract skills like "company", "establishment", and "IPO", but the meaning of each skill would be unclear and difficult to use for searching.
Our approach uses LLM to decompose text into skills hidden behind Tags/Project Histories. Additionally, instead of treating all generated skills equally, we also get a score representing how closely related each skill is to the Tag/Project History (see the following tables of examples). This score helps achieve more convincing matching results.
Using LLM for skill extraction yielded more appropriate skills than we initially expected. Here are some examples (Figure 4).

Figure 4. Examples of Skill Extractions from Tags

Scoring

Beatrust Scout's scoring system goes beyond traditional self-declared skills by incorporating external validations and real-world experience. This multi-dimensional approach ensures more reliable and objective talent matching.
Tag Score Components
The tag score combines several validation factors:

Hybrid Similarity: Not only cosine similarity, we also use keyword matching logic to calculate similarity score.
Relevance: Relevance of skills and tags
Peer Endorsements: If tags are given by colleagues, the score is higher
+1 Endorsements: If a colleague confirms the tag by adding a "+1" endorsement to it, the score is higher

Project History Score Components
The Project History score is based on:

Hybrid Similarity: Not only cosine similarity, we also use keyword matching logic to calculate similarity score.
Relevance: Relevance of skills and projects.
Duration of Involvement: Longer project engagement indicates deeper expertise
Project Recency: Recent experiences are weighted more heavily

The weighted sum of these individual scores ultimately determines the final score for each input skill and user, which is displayed in the results (Figure 5).

Figure 5: Beatrust Scout's Search Results Interface. The central bars and scores represent individual skill match ratings for each search skill. The overall score on the right indicates the candidate's total compatibility with the search query.

This approach offers several key advantages:

Reduced Self-Reporting Bias: External validations provide objective skill assessment
Real-World Expertise Verification: Project History confirms practical application
Dynamic Skill Evolution: Recent experiences reflect current capabilities
Community-Driven Validation: Peer endorsements strengthen credibility

Possible extension: Visualization Possible Through Combination with Hierarchical Data

※ This visualization feature is not implemented now (2024/11/19).
After decomposing profile items into skills, we can visualize individual/team skill hierarchies by combining this skill information with existing structured skill datasets (e.g., ESCO (European Skills, Competences, Qualifications and Occupations)). Figure 6 shows the example of a hierarchical structure dataset which is related to programming-related skills.

Figure 6. An example of structured skill datasets like ESCO

Assuming similar skill names between extracted skills from Tags/Project Histories and dataset skills, we can link each Tag/Project History with existing skill datasets through embedding-based matching, enabling visualization of an organization's or individual's skills. (The example of visualization of skills in Beatrust is described in Figure 7)

Figure 6. The example of visualization of skills of the whole Beatrust organization using ESCO. In this example, hierarchical circle visualization is used. The circle in the center has the most abstract meaning, the skills that exist at the edges are more concrete. The size of each category indicates the percentage of the number of tags it contains; in Beatrust, it is easy to see that there are many tags related to communication.

Conclusion

In this blog post, we introduced the abstract logic behind Beatrust Scout, our new talent finding system. Beatrust Scout is expected to be a major breakthrough in talent search technology, overcoming the limitations of traditional search methods.

Here are the key highlights:

Advanced Skill Extraction Technology
- Sophisticated skill decomposition using LLM
- Context-aware weighted scoring
- Ability to interpret technical abbreviations and even emojis
Reliable Matching System
- Integration of external validations
- Community-based verification system
- Utilization of Project History
Flexible Search Capabilities
- Free-text search functionality
- Multi-dimensional talent discovery
- Intelligent ranking display

Organizations implementing this system can benefit from:

More effective utilization of internal talent
More accurate talent matching
Efficient talent allocation and development

We Are Hiring!!

We are hiring those who are interested in working with many talented members in a bilingual environment!
https://en.corp.beatrust.com/careers