## Similarité Mutual Information The distributional metric $P(i,j)$ of $i$ and $j$ terms is: $$ P(i,j) = \frac {\sum_{k \neq i,j ; M_{ik} >0}^{} \min(M_{ik}, M_{jk})}{\sum_{k \neq i,j ; M_{ik}>0}^{}}, $$ where $M_{ij}$ is defined as $$ M_{ij} = \log\left(\frac{C_{ij}}{E_{ij}}\right), $$ with $C_{ij}$ the number of word bags containing cooccurrences of $i$ and $j$; and $E_{ij}$ defined as (given a map list of size $m$): $$ E_{ij} = \frac {S_{i} S_{j}} {N_{m}} $$ with $S_i$ the total number of cooccurrences of term $i$, $$S_{i} = \sum_{j=1, j \neq i}^{m} C_{ij}$$ and $N_m$ the total number of cooccurrences of terms with a map list of size $m$: $$ N_{m} = \sum_{i=1}^m\,S_i = \sum_{i=1}^{m} \sum_{j=1, j \neq i}^{m} C_{ij}. $$ :::info GargStamp: ca80367ce4c78565382ce840c378291a6ccb67b48776f4d082934fed5110bb8c :::
{}