INDEX
Explanations
references to statements and remarks made by individuals
New Auto-Interp
Negative Logits
786
-0.17
jian
-0.17
ialis
-0.17
Margins
-0.17
uida
-0.16
Harding
-0.15
olph
-0.15
ush
-0.15
loom
-0.14
кин
-0.14
POSITIVE LOGITS
(#)
0.16
abol
0.16
æijĺ
0.15
utilus
0.15
ritz
0.14
neider
0.14
abay
0.14
orde
0.14
semicolon
0.14
atoon
0.13
Activations Density 0.027%