Daily PubMed Search and AI-Powered Paper Summaries: An Automated Email Notification System

2024年11月13日 19:14

Introduction

For researchers, staying abreast of the latest medical research papers is crucial yet time-consuming. This article introduces an automated system that conducts daily PubMed searches based on predefined keywords, summarizes the findings using artificial intelligence, and delivers the results via email.

The primary goals of this system are to:

Enable laboratory members to efficiently track the latest research trends
Minimize the risk of overlooking relevant papers
Facilitate quick comprehension of English papers through Japanese summaries
Support students in enhancing their research literacy and English proficiency

System Overview

The automated system comprises the following components:

A Python virtual environment on a Raspberry Pi
PubMed API for paper searches
OpenAI GPT model for paper summarization
SMTP library for email transmission

Implementation Process

Raspberry Pi Virtual Environment Setup

Begin by verifying your Python version and creating a virtual environment:

# Check your Python version
python -V

# Creating python environment using venv
sudo python3.xx -m venv project_env
source project_env/bin/activate
sudo chown -R user:user /home/user/virtual_environment_folder_name

Library Installation

pip install openai requests xmltodict python-dotenv

OpenAI API Key Acquisition

Create an account on the OpenAI website and obtain an API key.

https://qiita.com/kofumi/items/16a9a501ffc8dd49da50

Python script

Create a Python script (e.g., script.py) that performs paper searches, summarizations, and email notifications.

cd hoge
sudo nano script.py

from openai import OpenAI
import os
import requests
import xmltodict
from datetime import datetime, timedelta
from dotenv import load_dotenv
import time
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

SMTP_SERVER = os.getenv("SMTP_SERVER")
SMTP_PORT = os.getenv("SMTP_PORT")
SMTP_USERNAME = os.getenv("SMTP_USERNAME")
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD")
SENDER_EMAIL = os.getenv("SENDER_EMAIL")
RECIPIENT_EMAIL = os.getenv("RECIPIENT_EMAIL")
CC_EMAIL = os.getenv("CC_EMAIL", "")
BCC_EMAIL = os.getenv("BCC_EMAIL", "")

PUBMED_QUERIES = os.getenv("PUBMED_QUERIES").split(',')

PUBMED_PUBTYPES = [
    "Journal Article",
    "Books and Documents",
    "Clinical Trial",
    "Meta-Analysis",
    "Randomized Controlled Trial",
    "Review",
    "Systematic Review",
]
PUBMED_TERM = 1

PROMPT_PREFIX = (
    "You are a highly educated and trained researcher. Please explain the following paper in Japanese, separating the title and summary with line breaks. Be sure to write the main points in bullet-point format."
)

def main():
    client = OpenAI(api_key=OPENAI_API_KEY)
    
    today = datetime.now()
    yesterday = today - timedelta(days=PUBMED_TERM)

    for query in PUBMED_QUERIES:
        while True:
            try:
                ids = get_paper_ids_on(yesterday, query)
                print(f"Number of paper IDs for {query}: {len(ids)}")
                output = ""
                paper_count = 0
                for i, id in enumerate(ids):
                    summary = get_paper_summary_by_id(id)
                    pubtype_check_result = check_pubtype(summary["pubtype"])
                    print(f"ID {id} pubtype: {summary['pubtype']}, Check result: {pubtype_check_result}")
                    if not pubtype_check_result:
                        continue
                    paper_count += 1
                    abstract = get_paper_abstract_by_id(id)
                    print(f"ID {id} title: {summary['title']}")
                    print(f"ID {id} abstract: {abstract}\n")
                    input_text = f"\ntitle: {summary['title']}\nabstract: {abstract}"

                    response = client.chat.completions.create(
                        messages=[
                            {
                                "role": "user",
                                "content": PROMPT_PREFIX + "\n" + input_text,
                            },
                        ],
                        model="gpt-4o-mini",
                    )
                    
                    content = response.choices[0].message.content.strip()
                    
                    pubmed_url = f"https://pubmed.ncbi.nlm.nih.gov/{id}"
                    output += f"PubMed New Paper Notification ({query})\n\n{content}\n\n{pubmed_url}\n\n\n"

                if output:
                    send_email(query, output, to_yyyymmdd(yesterday))
                else:
                    print(f"No new papers for query: {query}")

                break
                
            except openai.RateLimitError as e:
                print("Rate limit exceeded. Waiting for 300 seconds before retrying.")
                time.sleep(300)
            except Exception as e:
                print(f"An error occurred: {e}")
                time.sleep(60)

def to_yyyymmdd(date):
    return date.strftime("%Y/%m/%d")

def get_paper_ids_on(date, query):
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&sort=pub_date&term={query}&mindate={to_yyyymmdd(date)}&maxdate={to_yyyymmdd(date)}&retmax=1000&retstart=0"
    res = requests.get(url).json()
    return res["esearchresult"]["idlist"]

def get_paper_summary_by_id(id):
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=json&id={id}"
    res = requests.get(url).json()
    return res["result"][id]

def get_paper_abstract_by_id(id):
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id={id}"
    res = requests.get(url).text
    xml_dict = xmltodict.parse(res)
    abstract = xml_dict["PubmedArticleSet"]["PubmedArticle"]["MedlineCitation"]["Article"].get("Abstract", {}).get("AbstractText", "")
    return abstract if abstract else ""

def check_pubtype(pubtypes):
    return any(pubtype in PUBMED_PUBTYPES for pubtype in pubtypes)

def send_email(query, content, search_date):
    msg = MIMEMultipart('alternative')
    msg['Subject'] = f'New Paper Notification ({query}) - Search Date: {search_date}'
    msg['From'] = SENDER_EMAIL
    msg['To'] = RECIPIENT_EMAIL
    if CC_EMAIL:
        msg['Cc'] = CC_EMAIL
    
    text = content

    html = content.replace('\n', '<br>')
    html = f'<html><body>{html}</body></html>'

    part1 = MIMEText(text, 'plain')
    part2 = MIMEText(html, 'html')
    msg.attach(part1)
    msg.attach(part2)

    try:
        with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server:
            server.starttls()
            server.login(SMTP_USERNAME, SMTP_PASSWORD)
            recipients = [RECIPIENT_EMAIL]
            if CC_EMAIL:
                recipients.extend(CC_EMAIL.split(','))
            if BCC_EMAIL:
                recipients.extend(BCC_EMAIL.split(','))
            server.sendmail(SENDER_EMAIL, recipients, msg.as_string())
        print(f"Email sent for query: {query}")
    except Exception as e:
        print(f"Failed to send email for query {query}. Error: {e}")

if __name__ == "__main__":
    main()

Environment Variable Configuration

Create a .env file to store sensitive information such as API keys and email credentials.

cd hoge
sudo nano .env

# OpenAI API Key
OPENAI_API_KEY=Enter your OpenAI API key here

# E-mail setting
SMTP_SERVER=Enter your SMTP server address here
SMTP_PORT=587 # Depends on your email system
SMTP_USERNAME=Enter your email ID here
SMTP_PASSWORD=Enter your password here
SENDER_EMAIL=Enter sender's email address here
RECIPIENT_EMAIL=Enter recipient's email address here # For multiple recipients: xxx@xxx.xx, yyy@yyy.yy
CC_EMAIL=Enter CC email address here # Can be left blank. For multiple recipients: xxx@xxx.xx, yyy@yyy.yy
BCC_EMAIL=Enter BCC email address here # Can be left blank. For multiple recipients: xxx@xxx.xx, yyy@yyy.yy

# PubMed search queries, separated by commas
PUBMED_QUERIES=term1 term2, term3, term4, term5 term6 term7

Execution Script Creation

Develop a shell script (script.sh) to activate the virtual environment and run the Python script.

cd hoge
sudo nano script.sh

#!/bin/sh
PROG_DIR=/home/fuge/hoge
source hoge/bin/activate
python3 $PROG_DIR/script.py
deactivate

Crontab Configuration

crontab -e

0 7 * * * cd /home/fuga/huge; sudo bash script.sh

0 7 * * * /home/user/hoge/venv/bin/python /home/user/fuga/huge/script.py

This will cause the script to run every morning at 7:00 AM.

Benefits

This automated system offers several advantages:

Daily collection and summarization of the latest research papers in Japanese
Comprehensive coverage of multiple research areas through customizable keywords
Efficient information sharing via email notifications
Quick grasp of paper content through AI-generated summaries
Easy access to detailed information via PubMed links

Conclusion

This automated system significantly enhances laboratory productivity and supports the development of students' research skills. Regular keyword refinement ensures access to the most current and relevant information in your field of study.