INDEX
Explanations
phrases that indicate inclusion or presence within groups
New Auto-Interp
Negative Logits
Tash
-0.64
்கள்
-0.64
HPV
-0.63
Peshawar
-0.63
Othello
-0.63
R
-0.61
chec
-0.59
Ra
-0.59
L
-0.57
vbox
-0.57
POSITIVE LOGITS
AMONG
1.48
Amongst
1.44
among
1.36
Among
1.33
among
1.32
amongst
1.31
Among
1.16
parmi
1.05
Среди
0.98
среди
0.96
Activations Density 0.019%