MedStat Statistical Methods Documentation

This document provides comprehensive technical documentation of all statistical methods available in MedStat. Each section covers methodology, assumptions, applications, and interpretation guidelines for advanced medical statistics analysis.

Quick Navigation: Firth Logistic Propensity Score Kaplan-Meier Cox Regression References

1. Firth Logistic Regression

Overview

Firth Logistic Regression is a penalized logistic regression method developed by David Firth (1993) that provides improved inference for binary logistic regression, particularly in small samples and in the presence of complete separation.

Mathematical Framework

Standard logistic regression models the probability of a binary outcome:

P(Y=1|X) = exp(Xβ) / (1 + exp(Xβ))

Firth's approach adds a penalty term to the likelihood function to correct for bias in the maximum likelihood estimator:

L*(Xβ) = L(Xβ) + 0.5 * log(|I(β)|)

where I(β) is the Fisher Information matrix.

Key Advantages

✓ Handles small samples: Provides reliable estimates with limited data

✓ Complete separation: Addresses situations where outcome is perfectly separated by predictors

✓ Reduced bias: Penalized likelihood reduces bias in coefficient estimates

✓ Confidence intervals: Always computable, unlike standard logistic regression

Limitations

✗ Computational complexity: More computationally intensive than standard logistic regression

✗ Interpretation: Coefficients represent conditional effects, not marginal effects

Assumptions

Binary outcome variable (0/1)
Independence of observations
No multicollinearity among predictors
Correct model specification

When to Use Firth Logistic Regression

Small sample sizes (n < 100)
Rare events or outcomes
Evidence of complete separation
Binary classification problems in medical research
When standard logistic regression produces infinite coefficients

Interpretation Guide

Statistic	Interpretation	Example
Coefficient (β)	Log-odds change per unit increase	β=0.5 means log-odds increase by 0.5
Odds Ratio (OR)	exp(β) - relative odds of outcome	OR=2.0 means 2x increase in odds
95% CI	Range of plausible effect values	CI: 1.2-3.5 excludes null (OR=1)
p-value	Statistical significance (α=0.05)	p<0.05 indicates significant effect

2. Propensity Score Matching (PSM)

Overview

Propensity Score Matching is an observational study method that estimates the causal effect of a treatment by matching treated and control subjects with similar propensity scores (probability of treatment), creating balanced comparison groups.

Methodological Framework

PSM operates in four steps:

Propensity Score Estimation: Estimate probability of treatment using logistic regression or other methods
Common Support Region: Ensure overlap in propensity score distributions
Matching Algorithm: Match treated and control units with similar scores
Balance Assessment: Verify covariate balance after matching

Propensity Score Definition

e(X) = P(Z=1|X) = probability of treatment given observed characteristics

Matching Algorithms Available in MedStat

Algorithm	Description	Best For
1:1 Matching	Each treated unit matched to one control	Preserving sample size
1:2 Matching	Each treated matched to two controls	Increasing efficiency, smaller treated group
Caliper Matching	Match within specified distance threshold	Ensuring matching quality

Key Advantages

✓ Reduces selection bias: Creates comparability between groups

✓ Causal inference: Enables estimation of causal effects in observational data

✓ Flexible: Works with various treatment definitions and outcomes

Limitations

✗ Hidden bias: Cannot account for unmeasured confounders

✗ Common support: Units outside overlap region excluded

✗ Model dependence: Results depend on propensity score model specification

Assessing Covariate Balance

After matching, evaluate balance using standardized differences:

Standardized Difference < 0.1: Good balance
0.1 - 0.2: Acceptable balance
> 0.2: Poor balance - consider alternative matching approach

3. Kaplan-Meier Survival Analysis

Overview

The Kaplan-Meier estimator is a non-parametric method for estimating survival probability over time. It accounts for censored observations (subjects lost to follow-up) and produces step-function survival curves.

Kaplan-Meier Estimator

S(t) = ∏ᵢ (1 - dᵢ/nᵢ) for all i where tᵢ ≤ t

where dᵢ = number of events at time tᵢ, nᵢ = number at risk at time tᵢ

Key Characteristics

Non-parametric - makes no distributional assumptions
Accounts for censoring (incomplete follow-up)
Provides empirical survival function
Step-function decreases only at event times

Kaplan-Meier Curve Elements

Element	Meaning
Y-axis	Cumulative survival probability (0-1)
X-axis	Follow-up time (days/months/years)
Vertical drop	Event (death/outcome) occurred
Small mark (+)	Censoring (subject lost to follow-up)

When to Use Kaplan-Meier

Time-to-event data with censoring
Comparing survival between groups
Exploratory survival analysis
Visualizing survival distributions

Median Survival Time

The time at which 50% of subjects have experienced the event. Clinically meaningful for comparing group survival.

4. Cox Proportional Hazards Regression

Overview

Cox Proportional Hazards Regression is a semi-parametric method for modeling the relationship between survival time and multiple covariates. It doesn't assume a specific survival distribution.

Cox Model Equation

h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)

where h₀(t) is the baseline hazard function (unspecified)

Key Assumptions

Proportional Hazards: Ratio of hazards between groups is constant over time
Independence of censoring
No unobserved confounding
Correct functional form of covariates

Hazard Ratio Interpretation

HR Value	Interpretation
HR = 1.0	No effect - survival identical between groups
HR > 1.0	Increased hazard (higher mortality/event risk)
HR < 1.0	Decreased hazard (protective effect)
HR = 2.0	200% increase in hazard (2x higher event risk)
HR = 0.5	50% reduction in hazard (half the event risk)

Model Diagnostics

Concordance Index (C-index): Measures predictive accuracy (0.5-1.0)
Proportional Hazards Test: Validate PH assumption
Residuals Analysis: Check model fit and outliers

Advantages and Limitations

✓ Semi-parametric: Flexible - doesn't require specifying survival distribution

✓ Multivariate analysis: Adjust for multiple covariates simultaneously

✗ Proportional hazards assumption: May be violated in some datasets

References and Further Reading

Firth, D. (1993). "Bias reduction of maximum likelihood estimates." Biometrika, 80(1), 27-38.
Rosenbaum, P. R., & Rubin, D. B. (1983). "The central role of the propensity score in observational studies." Biometrika, 70(1), 41-55.
Kaplan, E. L., & Meier, P. (1958). "Nonparametric estimation from incomplete observations." Journal of the American Statistical Association, 53(282), 457-481.
Cox, D. R. (1972). "Regression models and life-tables." Journal of the Royal Statistical Society, 34(2), 187-220.

Citation for Your Research

If you use MedStat for published research, please include this citation in your methods section:

MedStat: Professional Medical Statistics Tool. Available at https://medstat.app

← Back to MedStat