MedStat Statistical Methods Documentation

This document provides comprehensive technical documentation of all statistical methods available in MedStat. Each section covers methodology, assumptions, applications, and interpretation guidelines for advanced medical statistics analysis.

1. Firth Logistic Regression

Overview

Firth Logistic Regression is a penalized logistic regression method developed by David Firth (1993) that provides improved inference for binary logistic regression, particularly in small samples and in the presence of complete separation.

Mathematical Framework

Standard logistic regression models the probability of a binary outcome:

P(Y=1|X) = exp(Xβ) / (1 + exp(Xβ))

Firth's approach adds a penalty term to the likelihood function to correct for bias in the maximum likelihood estimator:

L*(Xβ) = L(Xβ) + 0.5 * log(|I(β)|)

where I(β) is the Fisher Information matrix.

Key Advantages

✓ Handles small samples: Provides reliable estimates with limited data
✓ Complete separation: Addresses situations where outcome is perfectly separated by predictors
✓ Reduced bias: Penalized likelihood reduces bias in coefficient estimates
✓ Confidence intervals: Always computable, unlike standard logistic regression

Limitations

✗ Computational complexity: More computationally intensive than standard logistic regression
✗ Interpretation: Coefficients represent conditional effects, not marginal effects

Assumptions

When to Use Firth Logistic Regression

Interpretation Guide

Statistic Interpretation Example
Coefficient (β) Log-odds change per unit increase β=0.5 means log-odds increase by 0.5
Odds Ratio (OR) exp(β) - relative odds of outcome OR=2.0 means 2x increase in odds
95% CI Range of plausible effect values CI: 1.2-3.5 excludes null (OR=1)
p-value Statistical significance (α=0.05) p<0.05 indicates significant effect

2. Propensity Score Matching (PSM)

Overview

Propensity Score Matching is an observational study method that estimates the causal effect of a treatment by matching treated and control subjects with similar propensity scores (probability of treatment), creating balanced comparison groups.

Methodological Framework

PSM operates in four steps:

  1. Propensity Score Estimation: Estimate probability of treatment using logistic regression or other methods
  2. Common Support Region: Ensure overlap in propensity score distributions
  3. Matching Algorithm: Match treated and control units with similar scores
  4. Balance Assessment: Verify covariate balance after matching

Propensity Score Definition

e(X) = P(Z=1|X) = probability of treatment given observed characteristics

Matching Algorithms Available in MedStat

Algorithm Description Best For
1:1 Matching Each treated unit matched to one control Preserving sample size
1:2 Matching Each treated matched to two controls Increasing efficiency, smaller treated group
Caliper Matching Match within specified distance threshold Ensuring matching quality

Key Advantages

✓ Reduces selection bias: Creates comparability between groups
✓ Causal inference: Enables estimation of causal effects in observational data
✓ Flexible: Works with various treatment definitions and outcomes

Limitations

✗ Hidden bias: Cannot account for unmeasured confounders
✗ Common support: Units outside overlap region excluded
✗ Model dependence: Results depend on propensity score model specification

Assessing Covariate Balance

After matching, evaluate balance using standardized differences:

3. Kaplan-Meier Survival Analysis

Overview

The Kaplan-Meier estimator is a non-parametric method for estimating survival probability over time. It accounts for censored observations (subjects lost to follow-up) and produces step-function survival curves.

Kaplan-Meier Estimator

S(t) = ∏ᵢ (1 - dᵢ/nᵢ) for all i where tᵢ ≤ t

where dᵢ = number of events at time tᵢ, nᵢ = number at risk at time tᵢ

Key Characteristics

Kaplan-Meier Curve Elements

Element Meaning
Y-axis Cumulative survival probability (0-1)
X-axis Follow-up time (days/months/years)
Vertical drop Event (death/outcome) occurred
Small mark (+) Censoring (subject lost to follow-up)

When to Use Kaplan-Meier

Median Survival Time

The time at which 50% of subjects have experienced the event. Clinically meaningful for comparing group survival.

4. Cox Proportional Hazards Regression

Overview

Cox Proportional Hazards Regression is a semi-parametric method for modeling the relationship between survival time and multiple covariates. It doesn't assume a specific survival distribution.

Cox Model Equation

h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)

where h₀(t) is the baseline hazard function (unspecified)

Key Assumptions

Hazard Ratio Interpretation

HR Value Interpretation
HR = 1.0 No effect - survival identical between groups
HR > 1.0 Increased hazard (higher mortality/event risk)
HR < 1.0 Decreased hazard (protective effect)
HR = 2.0 200% increase in hazard (2x higher event risk)
HR = 0.5 50% reduction in hazard (half the event risk)

Model Diagnostics

Advantages and Limitations

✓ Semi-parametric: Flexible - doesn't require specifying survival distribution
✓ Multivariate analysis: Adjust for multiple covariates simultaneously
✗ Proportional hazards assumption: May be violated in some datasets

References and Further Reading

Citation for Your Research

If you use MedStat for published research, please include this citation in your methods section:

MedStat: Professional Medical Statistics Tool. Available at https://medstat.app

← Back to MedStat