Calculating Antibody Titers (Endpoint Titers)

2024年11月22日 08:40

Achievement

The quantification of antibody titers in biological samples plays a crucial role in various scientific fields. One widely employed method is the determination of endpoint titers, which involves a systematic dilution approach.

The conventional procedure entails performing serial dilutions of the sample until the absorbance value falls below a predetermined threshold. This critical point is typically identified as the endpoint titer.

In standard laboratory practice, the cutoff value is commonly established at twice the mean absorbance of the blank samples. This approach provides a reliable baseline for determining significant antibody presence.

To streamline this analytical process, I will present a Python script that efficiently performs these calculations. This computational tool facilitates rapid and accurate determination of endpoint titers.

Script Overview: Automated Antibody Titer Analysis System

Core Functionalities

Primary Analysis Capabilities
- Analysis of ELISA assay absorbance data
- Implementation of 4PL and 5PL logistic regression fitting algorithms
- Automated calculation of antibody titers
- Automated graph generation utilizing matplotlib library

Data Input Management

Data Source Compatibility
- Direct data extraction from Excel files (.xlsx) or CSV files
- Batch processing of multiple samples
- Automated dilution factor calculations with formula support

Analytical Features

Advanced Statistical Analysis
- Automated selection between 4PL/5PL models based on AIC criteria
- Assessment of fitting quality through multiple parameters:
  - R² (coefficient of determination)
  - Adjusted R²
  - Root Mean Square Error (RMSE)
- Automated detection of outliers and invalid data points
- Calculation of antibody titers based on established cutoff values

Output Capabilities

Comprehensive Result Documentation
- Generation of Excel-formatted results
- Individual sample graph production in PNG format
- Detailed logging functionality in verbose mode
- Comprehensive fitting statistics reports

Graphical Representation

Visualization Features
- Display of fitting curves
- Plotting of experimental data points
- Visual representation of cutoff thresholds
- Clear indication of antibody titer values

Environmental Setup

Implementation through Conda virtual environment

nano endpoint-titer.yml

name: endpoint-titer-analysis
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.12
  - numpy
  - pandas
  - scipy
  - openpyxl
  - matplotlib

# Create the analytical environment
conda env create -f endpoint-titer.yml

# Activate the analytical environment
conda activate endpoint-titer-analysis

Script

Data File Structure Specification

import openpyxl
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from matplotlib import font_manager
from scipy.optimize import curve_fit
from openpyxl.drawing.image import Image
import io
import argparse
from pathlib import Path
import sys
import platform
import os

# Function to set the appropriate font based on the OS
def set_japanese_font():
    system_os = platform.system()
    
    if system_os == "Darwin":  # macOS
        jp_font_path = "/System/Library/Fonts/ヒラギノ角ゴシック W6.ttc"  # Hiragino Sans
    elif system_os == "Windows":  # Windows
        jp_font_path = "C:/Windows/Fonts/msgothic.ttc"  # MS Gothic
    else:
        raise EnvironmentError("Unsupported operating system for this script")
    
    # Load and set the font
    jp_font = font_manager.FontProperties(fname=jp_font_path)
    plt.rcParams['font.family'] = jp_font.get_name()
    return jp_font

def four_pl(x, A, B, C, D):
    """4-Parameter Logistic Regression"""
    return D + (A-D)/(1.0+((x/C)**B))

def five_pl(x, A, B, C, D, E):
    """5-Parameter Logistic Regression"""
    return D + (A-D)/(1.0+((x/C)**B))**E


def load_data(file_path, sheet_name='Sheet1', encoding='utf-8'):
    """
    Function to load data from Excel or CSV files
    
    Parameters:
    -----------
    file_path : str or Path
        Input file path (Excel or CSV)
    sheet_name : str
        Sheet name for Excel files
    encoding : str
        CSV file encoding (default: utf-8)
        
    Returns:
    --------
    pd.DataFrame
        Loaded data
    """
    file_path = Path(file_path)
    file_extension = file_path.suffix.lower()
    
    if file_extension in ['.xlsx', '.xls']:
        # Load Excel file
        wb = openpyxl.load_workbook(file_path)
        sheet = wb[sheet_name]
        data = []
        for row in sheet.iter_rows(values_only=True):
            data.append(row)
        df = pd.DataFrame(data)
    elif file_extension == '.csv':
        # Load CSV file
        try:
            df = pd.read_csv(file_path, header=None, encoding=encoding)
        except UnicodeDecodeError:
            # Try with CP932 (Windows Japanese) if UTF-8 fails
            if encoding == 'utf-8':
                df = pd.read_csv(file_path, header=None, encoding='cp932')
            else:
                raise
        
        # Format CSV data to match Excel format
        if len(df.columns) < 13:
            raise ValueError("CSV file must have at least 13 columns (sample name + 12 data points")
    else:
        raise ValueError(f"Unsupported file format: {file_extension}")
    
    return df

def evaluate_dilution_rates(dilution_rates):
    """Evaluate and convert dilution rates to numerical values"""
    evaluated_rates = []
    for i, rate in enumerate(dilution_rates):
        if isinstance(rate, (int, float)):
            evaluated_rates.append(float(rate))
        elif isinstance(rate, str):
            if rate.startswith('='):
                parts = rate.split('*')
                if len(parts) == 2 and parts[1].isdigit():
                    if i == 0:
                        evaluated_rates.append(float(parts[1]))
                    else:
                        evaluated_rates.append(evaluated_rates[-1] * int(parts[1]))
                else:
                    try:
                        evaluated_rates.append(float(eval(rate[1:])))
                    except:
                        print(f"Warning: Could not evaluate value '{rate}'")
                        return None
            else:
                try:
                    evaluated_rates.append(float(rate))
                except ValueError:
                    print(f"Warning: Could not convert value '{rate}' to number")
                    return None
        else:
            print(f"Warning: Unknown type of value '{rate}'")
            return None
    return evaluated_rates

def calculate_fit_metrics(y_true, y_pred, n_params):
    """Calculate fitting quality metrics"""
    n = len(y_true)
    residuals = y_true - y_pred
    rss = np.sum(residuals**2)
    tss = np.sum((y_true - np.mean(y_true))**2)

    r2 = 1 - (rss/tss)
    adj_r2 = 1 - ((1-r2)*(n-1)/(n-n_params-1))
    aic = n * np.log(rss/n) + 2 * n_params
    bic = n * np.log(rss/n) + n_params * np.log(n)
    rmse = np.sqrt(np.mean(residuals**2))
    
    return {
        'R2': r2,
        'Adjusted_R2': adj_r2,
        'AIC': aic,
        'BIC': bic,
        'RMSE': rmse
    }

def get_initial_params(y_data, dilution_rates):
    """Optimized initial parameter estimation"""
    A_init = np.max(y_data) * 1.05
    D_init = np.min(y_data) * 0.95
    B_init = 1.0
    
    mid_response = (A_init + D_init) / 2
    closest_idx = np.argmin(np.abs(y_data - mid_response))
    C_init = dilution_rates[closest_idx]
    
    E_init = 1.0
    
    return {
        'A': A_init,
        'B': B_init,
        'C': C_init,
        'D': D_init,
        'E': E_init
    }

def fit_curve(x_data, y_data, method, init_params, verbose=False):
    """Execute and evaluate curve fitting"""
    try:
        if method == '4':
            bounds = ([0, 0.5, 0, 0], [np.inf, 10, np.inf, np.inf])
            p0 = [init_params['A'], init_params['B'], init_params['C'], init_params['D']]
            popt, pcov = curve_fit(four_pl, x_data, y_data, p0=p0, bounds=bounds, maxfev=50000)
            y_fit = four_pl(x_data, *popt)
            n_params = 4
        else:
            bounds = ([0, 0.5, 0, 0, 0.5], [np.inf, 10, np.inf, np.inf, 5])
            p0 = [init_params['A'], init_params['B'], init_params['C'], init_params['D'], init_params['E']]
            popt, pcov = curve_fit(five_pl, x_data, y_data, p0=p0, bounds=bounds, maxfev=50000)
            y_fit = five_pl(x_data, *popt)
            n_params = 5

        metrics = calculate_fit_metrics(y_data, y_fit, n_params)
        
        if verbose:
            print("\nFitting Results:")
            print(f"  R² = {metrics['R2']:.4f}")
            print(f"  Adjusted R² = {metrics['Adjusted_R2']:.4f}")
            print(f"  RMSE = {metrics['RMSE']:.4e}")
            if metrics['R2'] < 0.99:
                print("  Warning: R² is below 0.99. Please check fitting quality.")

        return popt, pcov, metrics, y_fit

    except RuntimeError as e:
        raise RuntimeError(f"Fitting failed: {str(e)}")

def process_data_and_calculate_titer(file_path, sheet_name, output_path, cutoff, method, replicates=2, verbose=False, log_path=None, encoding='utf-8'):
    """
    Process ELISA data and calculate titers
    
    Parameters:
    -----------
    file_path : str
        Path to input file (Excel or CSV)
    sheet_name : str
        Sheet name for Excel files
    output_path : str
        Path to output Excel file
    cutoff : float
        Cutoff value
    method : str
        Fitting method ('4', '5', 'auto')
    replicates : int
        Number of technical replicates (1: single, 2: duplicate)
    verbose : bool
        Whether to display detailed output
    log_path : str, optional
        Path to log file
    encoding : str
        Encoding for CSV files (default: utf-8)
    """
    # Open log file
    log_file = open(log_path, 'w', encoding='utf-8') if log_path and verbose else None
    
    def log_print(*args, **kwargs):
        if verbose:
            print(*args, **kwargs)
            if log_file:
                output = ' '.join(str(arg) for arg in args)
                if 'end' in kwargs:
                    output += kwargs['end']
                else:
                    output += '\n'
                log_file.write(output)
                log_file.flush()

    try:
        if verbose:
            log_print(f"Processing started: {file_path}")
            log_print(f"File format: {Path(file_path).suffix.lower()}")
            log_print(f"Method: {method}PL fitting")
            log_print(f"Cutoff value: {cutoff}")
            log_print(f"Number of technical replicates: {replicates}")

        # Load data
        df = load_data(file_path, sheet_name, encoding)
        
        # Create output Excel file
        output_wb = openpyxl.Workbook()
        results_sheet = output_wb.active
        results_sheet.title = "Results"
        plots_sheet = output_wb.create_sheet("Plots")

        if verbose:
            log_print("\nData loading details:")
            log_print(f"Total rows: {len(df)}")
            log_print("First few rows:")
            log_print(df.head())

        # Get dilution rates from first row
        dilution_rates = df.iloc[0, 1:13].values
        if verbose:
            log_print(f"\nFound dilution rates: {dilution_rates}")

        evaluated_rates = evaluate_dilution_rates(dilution_rates)
        if evaluated_rates is None:
            raise ValueError("Dilution rate data contains invalid values.")
        
        dilution_rates = np.array(evaluated_rates, dtype=float)

        if verbose:
            log_print(f"Evaluated dilution rates: {dilution_rates}")

        # Sample data starts from third row
        sample_names = df.iloc[2:, 0].values
        df_data = df.iloc[2:, 1:13]

        # Detect data blocks (based on number of replicates)
        blocks = []
        row = 0
        while row < len(df_data):
            try:
                row_data = pd.to_numeric(df_data.iloc[row], errors='coerce')
                if not row_data.isna().all() and row + (8 * replicates // 2) < len(df_data):
                    blocks.append((row, row + (8 * replicates // 2) - 1))
                    row += 8 * replicates // 2
                else:
                    row += 1
            except Exception as e:
                if verbose:
                    log_print(f"Error while processing row {row}: {str(e)}")
                row += 1

        if verbose:
            log_print(f"\nDetected data blocks:")
            for i, (start, end) in enumerate(blocks):
                log_print(f"Block {i+1}: rows {start+1} to {end+1}")

        if not blocks:
            raise ValueError("No valid data blocks found.")

        results_df = pd.DataFrame(columns=[
            'Sample', 'Titer', 'R2', 'Adjusted_R2', 'RMSE', 'Fitting_Method'
        ])
        
        for block_idx, (start_row, end_row) in enumerate(blocks):
            if verbose:
                log_print(f"\nStarting processing of block {block_idx+1}:")
                log_print(f"Row range: {start_row+1} to {end_row+1}")

            block_data = df_data.iloc[start_row:end_row+1]

            # Adjust data processing based on number of replicates
            for sample_idx in range(0, end_row - start_row + 1, replicates):
                try:
                    # Get replicate data
                    replicate_data = block_data.iloc[sample_idx:sample_idx+replicates]
                    replicate_numeric = replicate_data.apply(pd.to_numeric, errors='coerce')
                    y_data = replicate_numeric.mean().values

                    if verbose:
                        log_print(f"\n  Processing sample {sample_idx//replicates + 1}:")
                        log_print(f"  Data: {y_data}")

                    if np.isnan(y_data).any():
                        log_print(f"Warning: Sample {sample_idx//replicates + 1} contains invalid data")
                        continue

                    sample_name = sample_names[start_row + sample_idx]

                    if verbose:
                        log_print(f"Processing sample: {sample_name}")
                        process_rows = [start_row + sample_idx + i + 1 for i in range(replicates)]
                        log_print(f"Using data: average of rows {', '.join(map(str, process_rows))}")

                    init_params = get_initial_params(y_data, dilution_rates)

                    try:
                        if method == '4':
                            popt, pcov, metrics, y_fit = fit_curve(
                                dilution_rates, y_data, '4', init_params, verbose
                            )
                            final_method = '4'
                        elif method == '5':
                            popt, pcov, metrics, y_fit = fit_curve(
                                dilution_rates, y_data, '5', init_params, verbose
                            )
                            final_method = '5'
                        else:
                            metrics_4pl = None
                            metrics_5pl = None
                            
                            try:
                                popt_4pl, _, metrics_4pl, y_fit_4pl = fit_curve(
                                    dilution_rates, y_data, '4', init_params, verbose
                                )
                            except RuntimeError:
                                if verbose:
                                    log_print("4PL fitting failed")
                            
                            try:
                                popt_5pl, _, metrics_5pl, y_fit_5pl = fit_curve(
                                    dilution_rates, y_data, '5', init_params, verbose
                                )
                            except RuntimeError:
                                if verbose:
                                    log_print("5PL fitting failed")
                            
                            if metrics_4pl and metrics_5pl:
                                if metrics_4pl['AIC'] < metrics_5pl['AIC']:
                                    popt, metrics, y_fit = popt_4pl, metrics_4pl, y_fit_4pl
                                    final_method = '4'
                                else:
                                    popt, metrics, y_fit = popt_5pl, metrics_5pl, y_fit_5pl
                                    final_method = '5'
                            elif metrics_4pl:
                                popt, metrics, y_fit = popt_4pl, metrics_4pl, y_fit_4pl
                                final_method = '4'
                            elif metrics_5pl:
                                popt, metrics, y_fit = popt_5pl, metrics_5pl, y_fit_5pl
                                final_method = '5'
                            else:
                                raise RuntimeError("Both fitting methods failed")

                        titer = np.interp(cutoff, y_fit[::-1], dilution_rates[::-1])

                        new_row = pd.DataFrame([{
                            'Sample': sample_name,
                            'Titer': titer,
                            'R2': metrics['R2'],
                            'Adjusted_R2': metrics['Adjusted_R2'],
                            'RMSE': metrics['RMSE'],
                            'Fitting_Method': f'{final_method}PL'
                        }])
                        results_df = pd.concat([results_df, new_row], ignore_index=True)

                        # Set the appropriate font before plotting
                        jp_font = set_japanese_font()

                        # Plot using the same sample name
                        replicate_numeric = replicate_data.apply(pd.to_numeric, errors='coerce')
                        y_data = replicate_numeric.mean().values
                        y_errors = replicate_numeric.sem().values if replicates > 1 else np.zeros_like(y_data)

                        plt.figure(figsize=(10, 6))
                        plt.errorbar(dilution_rates, y_data, 
                                    yerr=y_errors,
                                    fmt='o', label='Measured values',
                                    capsize=5)
                        plt.semilogx(dilution_rates, y_fit, '-', label='Fitting curve')
                        plt.axhline(y=cutoff, color='r', linestyle='--', label='Cutoff')
                        plt.axvline(x=titer, color='g', linestyle='--', label='Antibody titer')
                        plt.xlabel('Dilution rate')
                        plt.ylabel('Absorbance')
                        plt.title(f'{sample_name} ({final_method}PL fitting)')
                        plt.legend()
                        plt.grid(True)

                        # Save plot to memory
                        img_buffer = io.BytesIO()
                        plt.savefig(img_buffer, format='png', dpi=300)

                        # Save as individual PNG file
                        plot_dir = Path(output_path).parent / 'plots'
                        if not os.path.exists(plot_dir):
                            os.makedirs(plot_dir)
                        plot_path = plot_dir / f'{sample_name}_plot.png'
                        plt.savefig(plot_path, dpi=300, bbox_inches='tight')
                        plt.close()

                        # Place plot in Excel
                        img = Image(img_buffer)
                        img.width = 600
                        img.height = 400
                        plots_sheet.cell(row=row_position-1, column=1, value=sample_name)
                        plots_sheet.add_image(img, f'A{row_position}')

                        if verbose:
                            log_print(f"Plot placement: Sample {sample_name} placed at row {row_position}")

                    except Exception as e:
                        log_print(f"Warning: Error during fitting for {sample_name}: {str(e)}")
                
                except Exception as e:
                    log_print(f"Warning: Error processing block {block_idx+1}, pair {pair_idx//2+1}: {str(e)}")

        # Write results to Results sheet
        for i, col in enumerate(results_df.columns):
            results_sheet.cell(row=1, column=i+1, value=col)
        
        for i, row in results_df.iterrows():
            for j, value in enumerate(row):
                results_sheet.cell(row=i+2, column=j+1, value=value)

        # Adjust column widths
        for column in results_sheet.columns:
            max_length = 0
            column = [cell for cell in column]
            for cell in column:
                try:
                    if len(str(cell.value)) > max_length:
                        max_length = len(str(cell.value))
                except:
                    pass
            adjusted_width = (max_length + 2)
            results_sheet.column_dimensions[column[0].column_letter].width = adjusted_width

        # Save workbook and return results
        output_wb.save(output_path)
        return len(results_df)

    except Exception as e:
        raise Exception(f"Error occurred during processing: {str(e)}")

    finally:
        if log_file:
            log_file.close()

def parse_args():
    parser = argparse.ArgumentParser(
        description='ELISA Analysis Tool - Optimized Version',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Input file formats:
  - Excel (.xlsx, .xls)
  - CSV (.csv)
    
Note: 
  - Both formats require dilution rates in row 1 and data from row 3 onwards
  - CSV files must have at least 13 columns (sample name + 12 data points)
  - Output is always in Excel (.xlsx) format"""
    )
    
    parser.add_argument('--input', '-i', required=True,
                       help='Input file (Excel or CSV)')
    
    parser.add_argument('--cutoff', '-c', type=float, required=True,
                       help='Cutoff value')
    
    parser.add_argument('--method', '-m', choices=['4', '5', 'auto'],
                       default='auto',
                       help='Fitting method (4: 4PL, 5: 5PL, auto: automatic selection)')
    
    parser.add_argument('--replicates', '-r', type=int, choices=[1, 2],
                       default=2,
                       help='Number of technical replicates (1: single, 2: duplicate)')
    
    parser.add_argument('--verbose', '-v', action='store_true',
                       help='Display detailed output')
    
    parser.add_argument('--encoding', '-e', default='utf-8',
                       help='CSV file encoding (default: utf-8)')
    
    return parser.parse_args()

def main():
    args = parse_args()
    
    try:
        input_path = Path(args.input)
        
        # Check input file existence
        if not input_path.exists():
            print(f"Error: File '{input_path}' not found", file=sys.stderr)
            return 1

        # Check file format
        if input_path.suffix.lower() not in ['.xlsx', '.xls', '.csv']:
            print(f"Error: Unsupported file format: {input_path.suffix}", file=sys.stderr)
            print("Supported formats: .xlsx, .xls, .csv", file=sys.stderr)
            return 1
        
        # Generate output filename (output in xlsx format even for CSV input)
        output_path = input_path.parent / f'results_{input_path.stem}.xlsx'
        
        # Set log file path
        log_path = None
        if args.verbose:
            log_path = input_path.parent / f'analysis_log_{input_path.stem}.txt'
        
        # Set sheet name based on file format
        sheet_name = 'Sheet1' if input_path.suffix.lower() in ['.xlsx', '.xls'] else None
        
        try:
            num_samples = process_data_and_calculate_titer(
                args.input,
                sheet_name,
                output_path,
                args.cutoff,
                args.method,
                args.replicates,
                args.verbose,
                log_path,
                encoding=args.encoding
            )
            
            print(f"Processing complete: Analyzed {num_samples} samples")
            print(f"Results saved to {output_path}")
            if args.verbose:
                print(f"Analysis log saved to {log_path}")
            
        except UnicodeDecodeError as e:
            print(f"Encoding error: {str(e)}", file=sys.stderr)
            print("Please specify appropriate encoding using --encoding option", file=sys.stderr)
            print("Example: -e shift-jis or -e cp932", file=sys.stderr)
            return 1
            
        except ValueError as e:
            print(f"Error: {str(e)}", file=sys.stderr)
            return 1
            
        except Exception as e:
            print(f"Unexpected error occurred: {str(e)}", file=sys.stderr)
            if args.verbose:
                import traceback
                traceback.print_exc()
            return 1
        
    except Exception as e:
        print(f"Error occurred: {str(e)}", file=sys.stderr)
        if args.verbose:
            import traceback
            traceback.print_exc()
        return 1
    
    return 0

if __name__ == '__main__':
    exit(main())

Usage example

usage: Endpoint_titer.py [-h] --input INPUT --cutoff CUTOFF [--method {4,5,auto}] [--replicates {1,2}] [--verbose]

Optimized ELISA Analysis Tool

options:
  -h, --help            Display help message and exit
  --input INPUT, -i INPUT
                        Specify input Excel file or csv file
  --cutoff CUTOFF, -c CUTOFF
                        Define cutoff value
  --method {4,5,auto}, -m {4,5,auto}
                        Select fitting method (4: 4PL, 5: 5PL, auto: Automatic selection)
  --replicates {1,2}, -r {1,2}
                        Specify number of technical replicates (1: Single, 2: Duplicate)
  --verbose, -v         Enable detailed output logging



Implementation Examples

Basic Implementation

bashCopypython endpoint_titer.py --input data.xlsx --cutoff 0.5

Specified Fitting Methods

bashCopy# Implementation with 4PL fitting
python endpoint_titer.py --input data.xlsx --cutoff 0.5 --method 4

# Implementation with 5PL fitting
python endpoint_titer.py --input data.xlsx --cutoff 0.5 --method 5

# Implementation with automatic method selection
python endpoint_titer.py --input data.xlsx --cutoff 0.5 --method auto

Enhanced Logging Implementation

bashCopypython elisa_analysis.py --input data.xlsx --cutoff 0.5 --verbose

Conclusion

This automated system significantly streamlines the calculation of antibody titers, reducing what was previously a time-consuming process to a matter of moments. The increased efficiency in data analysis should allow researchers and educators to allocate more time to their core academic and research activities.

It is worth noting that this represents just one of several analytical tools in our developing suite of research scripts. Additional computational resources and analytical tools will be introduced in subsequent communications.