INDEX
Negative Logits
adding
0.64
IF
0.57
kontrol
0.57
ח
0.57
اب
0.56
defined
0.54
determining
0.54
telefono
0.54
aint
0.53
definisi
0.53
POSITIVE LOGITS
ї
0.62
toothbrush
0.61
toothpaste
0.58
Jinping
0.56
Você
0.55
𝐕
0.54
brushing
0.54
}";
0.53
Rosh
0.53
Premier
0.53
Activations Density 0.003%