def compute_effsize_from_t(tval, nx=None, ny=None, N=None, eftype='cohen'): """Compute effect size from a T-value. Parameters ---------- tval : float T-value nx, ny : int, optional Group sample sizes. N : int, optional Total sample size (will not be used if nx and ny are specified) eftype : string, optional desired output effect size Returns ------- ef : float Effect size See Also -------- compute_effsize : Calculate effect size between two set of observations. convert_effsize : Conversion between effect sizes. Notes ----- If both nx and ny are specified, the formula to convert from *t* to *d* is: .. math:: d = t * \\sqrt{\\frac{1}{n_x} + \\frac{1}{n_y}} If only N (total sample size) is specified, the formula is: .. math:: d = \\frac{2t}{\\sqrt{N}} Examples -------- 1. Compute effect size from a T-value when both sample sizes are known. >>> from pingouin import compute_effsize_from_t >>> tval, nx, ny = 2.90, 35, 25 >>> d = compute_effsize_from_t(tval, nx=nx, ny=ny, eftype='cohen') >>> print(d) 0.7593982580212534 2. Compute effect size when only total sample size is known (nx+ny) >>> tval, N = 2.90, 60 >>> d = compute_effsize_from_t(tval, N=N, eftype='cohen') >>> print(d) 0.7487767802667672 """ if not _check_eftype(eftype): err = "Could not interpret input '{}'".format(eftype) raise ValueError(err) if not isinstance(tval, float): err = "T-value must be float" raise ValueError(err) # Compute Cohen d (Lakens, 2013) if nx is not None and ny is not None: d = tval * np.sqrt(1 / nx + 1 / ny) elif N is not None: d = 2 * tval / np.sqrt(N) else: raise ValueError('You must specify either nx + ny, or just N') return convert_effsize(d, 'cohen', eftype, nx=nx, ny=ny)
def compute_effsize(x, y, paired=False, eftype='cohen'): """Calculate effect size between two set of observations. Parameters ---------- x : np.array or list First set of observations. y : np.array or list Second set of observations. paired : boolean If True, uses Cohen d-avg formula to correct for repeated measurements (see Notes). eftype : string Desired output effect size. Available methods are: * ``'none'``: no effect size * ``'cohen'``: Unbiased Cohen d * ``'hedges'``: Hedges g * ``'glass'``: Glass delta * ``'r'``: correlation coefficient * ``'eta-square'``: Eta-square * ``'odds-ratio'``: Odds ratio * ``'AUC'``: Area Under the Curve * ``'CLES'``: Common Language Effect Size Returns ------- ef : float Effect size See Also -------- convert_effsize : Conversion between effect sizes. compute_effsize_from_t : Convert a T-statistic to an effect size. Notes ----- Missing values are automatically removed from the data. If ``x`` and ``y`` are paired, the entire row is removed. If ``x`` and ``y`` are independent, the Cohen :math:`d` is: .. math:: d = \\frac{\\overline{X} - \\overline{Y}} {\\sqrt{\\frac{(n_{1} - 1)\\sigma_{1}^{2} + (n_{2} - 1) \\sigma_{2}^{2}}{n1 + n2 - 2}}} If ``x`` and ``y`` are paired, the Cohen :math:`d_{avg}` is computed: .. math:: d_{avg} = \\frac{\\overline{X} - \\overline{Y}} {\\sqrt{\\frac{(\\sigma_1^2 + \\sigma_2^2)}{2}}} The Cohen’s d is a biased estimate of the population effect size, especially for small samples (n < 20). It is often preferable to use the corrected Hedges :math:`g` instead: .. math:: g = d \\times (1 - \\frac{3}{4(n_1 + n_2) - 9}) The Glass :math:`\\delta` is calculated using the group with the lowest variance as the control group: .. math:: \\delta = \\frac{\\overline{X} - \\overline{Y}}{\\sigma^2_{\\text{control}}} The common language effect size is the proportion of pairs where ``x`` is higher than ``y`` (calculated with a brute-force approach where each observation of ``x`` is paired to each observation of ``y``, see :py:func:`pingouin.wilcoxon` for more details): .. math:: \\text{CL} = P(X > Y) + .5 \\times P(X = Y) For other effect sizes, Pingouin will first calculate a Cohen :math:`d` and then use the :py:func:`pingouin.convert_effsize` to convert to the desired effect size. References ---------- * Lakens, D., 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front. Psychol. 4, 863. https://doi.org/10.3389/fpsyg.2013.00863 * Cumming, Geoff. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge, 2013. * https://osf.io/vbdah/ Examples -------- 1. Cohen d from two independent samples. >>> import numpy as np >>> import pingouin as pg >>> x = [1, 2, 3, 4] >>> y = [3, 4, 5, 6, 7] >>> pg.compute_effsize(x, y, paired=False, eftype='cohen') -1.707825127659933 The sign of the Cohen d will be opposite if we reverse the order of ``x`` and ``y``: >>> pg.compute_effsize(y, x, paired=False, eftype='cohen') 1.707825127659933 2. Hedges g from two paired samples. >>> x = [1, 2, 3, 4, 5, 6, 7] >>> y = [1, 3, 5, 7, 9, 11, 13] >>> pg.compute_effsize(x, y, paired=True, eftype='hedges') -0.8222477210374874 3. Glass delta from two independent samples. The group with the lowest variance will automatically be selected as the control. >>> pg.compute_effsize(x, y, paired=False, eftype='glass') -1.3887301496588271 4. Common Language Effect Size. >>> pg.compute_effsize(x, y, eftype='cles') 0.2857142857142857 In other words, there are ~29% of pairs where ``x`` is higher than ``y``, which means that there are ~71% of pairs where ``x`` is *lower* than ``y``. This can be easily verified by changing the order of ``x`` and ``y``: >>> pg.compute_effsize(y, x, eftype='cles') 0.7142857142857143 """ # Check arguments if not _check_eftype(eftype): err = "Could not interpret input '{}'".format(eftype) raise ValueError(err) x = np.asarray(x) y = np.asarray(y) if x.size != y.size and paired: warnings.warn("x and y have unequal sizes. Switching to " "paired == False.") paired = False # Remove rows with missing values x, y = remove_na(x, y, paired=paired) nx, ny = x.size, y.size if ny == 1: # Case 1: One-sample Test d = (x.mean() - y) / x.std(ddof=1) return d if eftype.lower() == 'glass': # Find group with lowest variance sd_control = np.min([x.std(ddof=1), y.std(ddof=1)]) d = (x.mean() - y.mean()) / sd_control return d elif eftype.lower() == 'r': # Return correlation coefficient (useful for CI bootstrapping) from scipy.stats import pearsonr r, _ = pearsonr(x, y) return r elif eftype.lower() == 'cles': # Compute exact CLES (see pingouin.wilcoxon) diff = x[:, None] - y return np.where(diff == 0, 0.5, diff > 0).mean() else: # Test equality of variance of data with a stringent threshold # equal_var, p = homoscedasticity(x, y, alpha=.001) # if not equal_var: # print('Unequal variances (p<.001). You should report', # 'Glass delta instead.') # Compute unbiased Cohen's d effect size if not paired: # https://en.wikipedia.org/wiki/Effect_size dof = nx + ny - 2 poolsd = np.sqrt( ((nx - 1) * x.var(ddof=1) + (ny - 1) * y.var(ddof=1)) / dof) d = (x.mean() - y.mean()) / poolsd else: # Report Cohen d-avg (Cumming 2012; Lakens 2013) # Careful, the formula in Lakens 2013 is wrong. Updated in Pingouin # v0.3.4 to use the formula provided by Cummings 2012. # Before that the denominator was just (SD1 + SD2) / 2 d = (x.mean() - y.mean()) / np.sqrt( (x.var(ddof=1) + y.var(ddof=1)) / 2) return convert_effsize(d, 'cohen', eftype, nx=nx, ny=ny)
def convert_effsize(ef, input_type, output_type, nx=None, ny=None): """Conversion between effect sizes. Parameters ---------- ef : float Original effect size. input_type : string Effect size type of ef. Must be ``'r'`` or ``'d'``. output_type : string Desired effect size type. Available methods are: * ``'cohen'``: Unbiased Cohen d * ``'hedges'``: Hedges g * ``'eta-square'``: Eta-square * ``'odds-ratio'``: Odds ratio * ``'AUC'``: Area Under the Curve * ``'none'``: pass-through (return ``ef``) nx, ny : int, optional Length of vector x and y. Required to convert to Hedges g. Returns ------- ef : float Desired converted effect size See Also -------- compute_effsize : Calculate effect size between two set of observations. compute_effsize_from_t : Convert a T-statistic to an effect size. Notes ----- The formula to convert **r** to **d** is given in [1]_: .. math:: d = \\frac{2r}{\\sqrt{1 - r^2}} The formula to convert **d** to **r** is given in [2]_: .. math:: r = \\frac{d}{\\sqrt{d^2 + \\frac{(n_x + n_y)^2 - 2(n_x + n_y)} {n_xn_y}}} The formula to convert **d** to :math:`\\eta^2` is given in [3]_: .. math:: \\eta^2 = \\frac{(0.5 d)^2}{1 + (0.5 d)^2} The formula to convert **d** to an odds-ratio is given in [4]_: .. math:: \\text{OR} = \\exp (\\frac{d \\pi}{\\sqrt{3}}) The formula to convert **d** to area under the curve is given in [5]_: .. math:: \\text{AUC} = \\mathcal{N}_{cdf}(\\frac{d}{\\sqrt{2}}) References ---------- .. [1] Rosenthal, Robert. "Parametric measures of effect size." The handbook of research synthesis 621 (1994): 231-244. .. [2] McGrath, Robert E., and Gregory J. Meyer. "When effect sizes disagree: the case of r and d." Psychological methods 11.4 (2006): 386. .. [3] Cohen, Jacob. "Statistical power analysis for the behavioral sciences. 2nd." (1988). .. [4] Borenstein, Michael, et al. "Effect sizes for continuous data." The handbook of research synthesis and meta-analysis 2 (2009): 221-235. .. [5] Ruscio, John. "A probability-based measure of effect size: Robustness to base rates and other factors." Psychological methods 1 3.1 (2008): 19. Examples -------- 1. Convert from Cohen d to eta-square >>> import pingouin as pg >>> d = .45 >>> eta = pg.convert_effsize(d, 'cohen', 'eta-square') >>> print(eta) 0.048185603807257595 2. Convert from Cohen d to Hegdes g (requires the sample sizes of each group) >>> pg.convert_effsize(.45, 'cohen', 'hedges', nx=10, ny=10) 0.4309859154929578 3. Convert Pearson r to Cohen d >>> r = 0.40 >>> d = pg.convert_effsize(r, 'r', 'cohen') >>> print(d) 0.8728715609439696 4. Reverse operation: convert Cohen d to Pearson r >>> pg.convert_effsize(d, 'cohen', 'r') 0.4000000000000001 """ it = input_type.lower() ot = output_type.lower() # Check input and output type for input in [it, ot]: if not _check_eftype(input): err = "Could not interpret input '{}'".format(input) raise ValueError(err) if it not in ['r', 'cohen']: raise ValueError("Input type must be 'r' or 'cohen'") # Pass-through option if it == ot or ot == 'none': return ef # Convert r to Cohen d (Rosenthal 1994) d = (2 * ef) / np.sqrt(1 - ef**2) if it == 'r' else ef # Then convert to the desired output type if ot == 'cohen': return d elif ot == 'hedges': if all(v is not None for v in [nx, ny]): return d * (1 - (3 / (4 * (nx + ny) - 9))) else: # If shapes of x and y are not known, return cohen's d warnings.warn("You need to pass nx and ny arguments to compute " "Hedges g. Returning Cohen's d instead") return d elif ot == 'glass': warnings.warn("Returning original effect size instead of Glass " "because variance is not known.") return ef elif ot == 'r': # McGrath and Meyer 2006 if all(v is not None for v in [nx, ny]): a = ((nx + ny)**2 - 2 * (nx + ny)) / (nx * ny) else: a = 4 return d / np.sqrt(d**2 + a) elif ot == 'eta-square': # Cohen 1988 return (d / 2)**2 / (1 + (d / 2)**2) elif ot == 'odds-ratio': # Borenstein et al. 2009 return np.exp(d * np.pi / np.sqrt(3)) else: # ['auc'] # Ruscio 2008 from scipy.stats import norm return norm.cdf(d / np.sqrt(2))
def test_check_eftype(self): """Test function _check_eftype.""" eftype = 'cohen' _check_eftype(eftype) eftype = 'fake' _check_eftype(eftype)
def compute_effsize(x, y, paired=False, eftype='cohen'): """Calculate effect size between two set of observations. Parameters ---------- x : np.array or list First set of observations. y : np.array or list Second set of observations. paired : boolean If True, uses Cohen d-avg formula to correct for repeated measurements (Cumming 2012) eftype : string Desired output effect size. Available methods are :: 'none' : no effect size 'cohen' : Unbiased Cohen d 'hedges' : Hedges g 'glass': Glass delta 'r' : correlation coefficient 'eta-square' : Eta-square 'odds-ratio' : Odds ratio 'AUC' : Area Under the Curve 'CLES' : Common language effect size Returns ------- ef : float Effect size See Also -------- convert_effsize : Conversion between effect sizes. compute_effsize_from_t : Convert a T-statistic to an effect size. Notes ----- Missing values are automatically removed from the data. If ``x`` and ``y`` are paired, the entire row is removed. If ``x`` and ``y`` are independent, the Cohen's d is: .. math:: d = \\frac{\\overline{X} - \\overline{Y}} {\\sqrt{\\frac{(n_{1} - 1)\\sigma_{1}^{2} + (n_{2} - 1) \\sigma_{2}^{2}}{n1 + n2 - 2}}} If ``x`` and ``y`` are paired, the Cohen :math:`d_{avg}` is computed: .. math:: d_{avg} = \\frac{\\overline{X} - \\overline{Y}} {0.5 * (\\sigma_1 + \\sigma_2)} The Cohen’s d is a biased estimate of the population effect size, especially for small samples (n < 20). It is often preferable to use the corrected effect size, or Hedges’g, instead: .. math:: g = d * (1 - \\frac{3}{4(n_1 + n_2) - 9}) If eftype = 'glass', the Glass :math:`\\delta` is reported, using the group with the lowest variance as the control group: .. math:: \\delta = \\frac{\\overline{X} - \\overline{Y}}{\\sigma_{control}} References ---------- .. [1] Lakens, D., 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front. Psychol. 4, 863. https://doi.org/10.3389/fpsyg.2013.00863 .. [2] Cumming, Geoff. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge, 2013. Examples -------- 1. Compute Cohen d from two independent set of observations. >>> import numpy as np >>> from pingouin import compute_effsize >>> np.random.seed(123) >>> x = np.random.normal(2, size=100) >>> y = np.random.normal(2.3, size=95) >>> d = compute_effsize(x=x, y=y, eftype='cohen', paired=False) >>> print(d) -0.2835170152506578 2. Compute Hedges g from two paired set of observations. >>> import numpy as np >>> from pingouin import compute_effsize >>> x = [1.62, 2.21, 3.79, 1.66, 1.86, 1.87, 4.51, 4.49, 3.3 , 2.69] >>> y = [0.91, 3., 2.28, 0.49, 1.42, 3.65, -0.43, 1.57, 3.27, 1.13] >>> g = compute_effsize(x=x, y=y, eftype='hedges', paired=True) >>> print(g) 0.8370985097811404 3. Compute Glass delta from two independent set of observations. The group with the lowest variance will automatically be selected as the control. >>> import numpy as np >>> from pingouin import compute_effsize >>> np.random.seed(123) >>> x = np.random.normal(2, scale=1, size=50) >>> y = np.random.normal(2, scale=2, size=45) >>> d = compute_effsize(x=x, y=y, eftype='glass') >>> print(d) -0.1170721973604153 """ # Check arguments if not _check_eftype(eftype): err = "Could not interpret input '{}'".format(eftype) raise ValueError(err) x = np.asarray(x) y = np.asarray(y) if x.size != y.size and paired: warnings.warn("x and y have unequal sizes. Switching to " "paired == False.") paired = False # Remove rows with missing values x, y = remove_na(x, y, paired=paired) nx, ny = x.size, y.size if ny == 1: # Case 1: One-sample Test d = (x.mean() - y) / x.std(ddof=1) return d if eftype.lower() == 'glass': # Find group with lowest variance sd_control = np.min([x.std(ddof=1), y.std(ddof=1)]) d = (x.mean() - y.mean()) / sd_control return d elif eftype.lower() == 'r': # Return correlation coefficient (useful for CI bootstrapping) from scipy.stats import pearsonr r, _ = pearsonr(x, y) return r elif eftype.lower() == 'cles': # Compute exact CLES diff = x[:, None] - y return max((diff < 0).sum(), (diff > 0).sum()) / diff.size else: # Test equality of variance of data with a stringent threshold # equal_var, p = homoscedasticity(x, y, alpha=.001) # if not equal_var: # print('Unequal variances (p<.001). You should report', # 'Glass delta instead.') # Compute unbiased Cohen's d effect size if not paired: # https://en.wikipedia.org/wiki/Effect_size dof = nx + ny - 2 poolsd = np.sqrt(((nx - 1) * x.var(ddof=1) + (ny - 1) * y.var(ddof=1)) / dof) d = (x.mean() - y.mean()) / poolsd else: # Report Cohen d-avg (Cumming 2012; Lakens 2013) d = (x.mean() - y.mean()) / (.5 * (x.std(ddof=1) + y.std(ddof=1))) return convert_effsize(d, 'cohen', eftype, nx=nx, ny=ny)
def convert_effsize(ef, input_type, output_type, nx=None, ny=None): """Conversion between effect sizes. Parameters ---------- ef : float Original effect size input_type : string Effect size type of ef. Must be 'r' or 'd'. output_type : string Desired effect size type. Available methods are :: 'none' : no effect size 'cohen' : Unbiased Cohen d 'hedges' : Hedges g 'glass': Glass delta 'eta-square' : Eta-square 'odds-ratio' : Odds ratio 'AUC' : Area Under the Curve nx, ny : int, optional Length of vector x and y. nx and ny are required to convert to Hedges g Returns ------- ef : float Desired converted effect size See Also -------- compute_effsize : Calculate effect size between two set of observations. compute_effsize_from_t : Convert a T-statistic to an effect size. Notes ----- The formula to convert **r** to **d** is given in ref [1]: .. math:: d = \dfrac{2r}{\sqrt{1 - r^2}} The formula to convert **d** to **r** is given in ref [2]: .. math:: r = \dfrac{d}{\sqrt{d^2 + \dfrac{(n_x + n_y)^2 - 2(n_x + n_y)} {n_xn_y}}} The formula to convert **d** to :math:`\eta^2` is given in ref [3]: .. math:: \eta^2 = \dfrac{(0.5 * d)^2}{1 + (0.5 * d)^2} The formula to convert **d** to an odds-ratio is given in ref [4]: .. math:: or = e(\dfrac{d * \pi}{\sqrt{3}}) The formula to convert **d** to area under the curve is given in ref [5]: .. math:: auc = \mathcal{N}_{cdf}(\dfrac{d}{\sqrt{2}}) References ---------- .. [1] Rosenthal, Robert. "Parametric measures of effect size." The handbook of research synthesis 621 (1994): 231-244. .. [2] McGrath, Robert E., and Gregory J. Meyer. "When effect sizes disagree: the case of r and d." Psychological methods 11.4 (2006): 386. .. [3] Cohen, Jacob. "Statistical power analysis for the behavioral sciences. 2nd." (1988). .. [4] Borenstein, Michael, et al. "Effect sizes for continuous data." The handbook of research synthesis and meta-analysis 2 (2009): 221-235. .. [5] Ruscio, John. "A probability-based measure of effect size: Robustness to base rates and other factors." Psychological methods 1 3.1 (2008): 19. Examples -------- 1. Convert from Cohen d to eta-square >>> from pingouin import convert_effsize >>> d = .45 >>> eta = convert_effsize(d, 'cohen', 'eta-square') >>> print(eta) 0.05 2. Convert from Cohen d to Hegdes g (requires the sample sizes of each group) >>> d = .45 >>> g = convert_effsize(d, 'cohen', 'hedges', nx=10, ny=10) >>> print(eta) 0.43 3. Convert Pearson r to Cohen d >>> r = 0.40 >>> d = convert_effsize(r, 'r', 'cohen') >>> print(d) 0.87 4. Reverse operation: convert Cohen d to Pearson r >>> d = 0.873 >>> r = convert_effsize(d, 'cohen', 'r') >>> print(r) 0.40 """ it = input_type.lower() ot = output_type.lower() # Check input and output type for input in [it, ot]: if not _check_eftype(input): err = "Could not interpret input '{}'".format(input) raise ValueError(err) if it not in ['r', 'cohen']: raise ValueError("Input type must be 'r' or 'cohen'") if it == ot: return ef d = (2 * ef) / np.sqrt(1 - ef**2) if it == 'r' else ef # Rosenthal 1994 # Then convert to the desired output type if ot == 'cohen': return d elif ot == 'hedges': if all(v is not None for v in [nx, ny]): return d * (1 - (3 / (4 * (nx + ny) - 9))) else: # If shapes of x and y are not known, return cohen's d print("You need to pass nx and ny arguments to compute Hedges g.", "Returning Cohen's d instead") return d elif ot == 'glass': print("Returning original effect size instead of Glass because", "variance is not known.") return ef elif ot == 'r': # McGrath and Meyer 2006 if all(v is not None for v in [nx, ny]): a = ((nx + ny)**2 - 2 * (nx + ny)) / (nx * ny) else: a = 4 return d / np.sqrt(d**2 + a) elif ot == 'eta-square': # Cohen 1988 return (d / 2)**2 / (1 + (d / 2)**2) elif ot == 'odds-ratio': # Borenstein et al. 2009 return np.exp(d * np.pi / np.sqrt(3)) elif ot == 'auc': # Ruscio 2008 from scipy.stats import norm return norm.cdf(d / np.sqrt(2)) else: return None