MedStat Statistical Methods Documentation
This document provides comprehensive technical documentation of all statistical methods available in MedStat. Each section covers methodology, assumptions, applications, and interpretation guidelines for advanced medical statistics analysis.
1. Firth Logistic Regression
Overview
Firth Logistic Regression is a penalized logistic regression method developed by David Firth (1993) that provides improved inference for binary logistic regression, particularly in small samples and in the presence of complete separation.
Mathematical Framework
Standard logistic regression models the probability of a binary outcome:
P(Y=1|X) = exp(Xβ) / (1 + exp(Xβ))
Firth's approach adds a penalty term to the likelihood function to correct for bias in the maximum likelihood estimator:
L*(Xβ) = L(Xβ) + 0.5 * log(|I(β)|)
where I(β) is the Fisher Information matrix.
Key Advantages
✓ Handles small samples: Provides reliable estimates with limited data
✓ Complete separation: Addresses situations where outcome is perfectly separated by predictors
✓ Reduced bias: Penalized likelihood reduces bias in coefficient estimates
✓ Confidence intervals: Always computable, unlike standard logistic regression
Limitations
✗ Computational complexity: More computationally intensive than standard logistic regression
✗ Interpretation: Coefficients represent conditional effects, not marginal effects
Assumptions
- Binary outcome variable (0/1)
- Independence of observations
- No multicollinearity among predictors
- Correct model specification
When to Use Firth Logistic Regression
- Small sample sizes (n < 100)
- Rare events or outcomes
- Evidence of complete separation
- Binary classification problems in medical research
- When standard logistic regression produces infinite coefficients
Interpretation Guide
| Statistic |
Interpretation |
Example |
| Coefficient (β) |
Log-odds change per unit increase |
β=0.5 means log-odds increase by 0.5 |
| Odds Ratio (OR) |
exp(β) - relative odds of outcome |
OR=2.0 means 2x increase in odds |
| 95% CI |
Range of plausible effect values |
CI: 1.2-3.5 excludes null (OR=1) |
| p-value |
Statistical significance (α=0.05) |
p<0.05 indicates significant effect |
2. Propensity Score Matching (PSM)
Overview
Propensity Score Matching is an observational study method that estimates the causal effect of a treatment by matching treated and control subjects with similar propensity scores (probability of treatment), creating balanced comparison groups.
Methodological Framework
PSM operates in four steps:
- Propensity Score Estimation: Estimate probability of treatment using logistic regression or other methods
- Common Support Region: Ensure overlap in propensity score distributions
- Matching Algorithm: Match treated and control units with similar scores
- Balance Assessment: Verify covariate balance after matching
Propensity Score Definition
e(X) = P(Z=1|X) = probability of treatment given observed characteristics
Matching Algorithms Available in MedStat
| Algorithm |
Description |
Best For |
| 1:1 Matching |
Each treated unit matched to one control |
Preserving sample size |
| 1:2 Matching |
Each treated matched to two controls |
Increasing efficiency, smaller treated group |
| Caliper Matching |
Match within specified distance threshold |
Ensuring matching quality |
Key Advantages
✓ Reduces selection bias: Creates comparability between groups
✓ Causal inference: Enables estimation of causal effects in observational data
✓ Flexible: Works with various treatment definitions and outcomes
Limitations
✗ Hidden bias: Cannot account for unmeasured confounders
✗ Common support: Units outside overlap region excluded
✗ Model dependence: Results depend on propensity score model specification
Assessing Covariate Balance
After matching, evaluate balance using standardized differences:
- Standardized Difference < 0.1: Good balance
- 0.1 - 0.2: Acceptable balance
- > 0.2: Poor balance - consider alternative matching approach
3. Kaplan-Meier Survival Analysis
Overview
The Kaplan-Meier estimator is a non-parametric method for estimating survival probability over time. It accounts for censored observations (subjects lost to follow-up) and produces step-function survival curves.
Kaplan-Meier Estimator
S(t) = ∏ᵢ (1 - dᵢ/nᵢ) for all i where tᵢ ≤ t
where dᵢ = number of events at time tᵢ, nᵢ = number at risk at time tᵢ
Key Characteristics
- Non-parametric - makes no distributional assumptions
- Accounts for censoring (incomplete follow-up)
- Provides empirical survival function
- Step-function decreases only at event times
Kaplan-Meier Curve Elements
| Element |
Meaning |
| Y-axis |
Cumulative survival probability (0-1) |
| X-axis |
Follow-up time (days/months/years) |
| Vertical drop |
Event (death/outcome) occurred |
| Small mark (+) |
Censoring (subject lost to follow-up) |
When to Use Kaplan-Meier
- Time-to-event data with censoring
- Comparing survival between groups
- Exploratory survival analysis
- Visualizing survival distributions
Median Survival Time
The time at which 50% of subjects have experienced the event. Clinically meaningful for comparing group survival.
4. Cox Proportional Hazards Regression
Overview
Cox Proportional Hazards Regression is a semi-parametric method for modeling the relationship between survival time and multiple covariates. It doesn't assume a specific survival distribution.
Cox Model Equation
h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ... + βₚXₚ)
where h₀(t) is the baseline hazard function (unspecified)
Key Assumptions
- Proportional Hazards: Ratio of hazards between groups is constant over time
- Independence of censoring
- No unobserved confounding
- Correct functional form of covariates
Hazard Ratio Interpretation
| HR Value |
Interpretation |
| HR = 1.0 |
No effect - survival identical between groups |
| HR > 1.0 |
Increased hazard (higher mortality/event risk) |
| HR < 1.0 |
Decreased hazard (protective effect) |
| HR = 2.0 |
200% increase in hazard (2x higher event risk) |
| HR = 0.5 |
50% reduction in hazard (half the event risk) |
Model Diagnostics
- Concordance Index (C-index): Measures predictive accuracy (0.5-1.0)
- Proportional Hazards Test: Validate PH assumption
- Residuals Analysis: Check model fit and outliers
Advantages and Limitations
✓ Semi-parametric: Flexible - doesn't require specifying survival distribution
✓ Multivariate analysis: Adjust for multiple covariates simultaneously
✗ Proportional hazards assumption: May be violated in some datasets
References and Further Reading
- Firth, D. (1993). "Bias reduction of maximum likelihood estimates." Biometrika, 80(1), 27-38.
- Rosenbaum, P. R., & Rubin, D. B. (1983). "The central role of the propensity score in observational studies." Biometrika, 70(1), 41-55.
- Kaplan, E. L., & Meier, P. (1958). "Nonparametric estimation from incomplete observations." Journal of the American Statistical Association, 53(282), 457-481.
- Cox, D. R. (1972). "Regression models and life-tables." Journal of the Royal Statistical Society, 34(2), 187-220.
Citation for Your Research
If you use MedStat for published research, please include this citation in your methods section:
MedStat: Professional Medical Statistics Tool. Available at https://medstat.app
← Back to MedStat