INDEX
Explanations
terms associated with classification or categorization
New Auto-Interp
Negative Logits
én
-0.15
adium
-0.15
witter
-0.14
ADX
-0.14
PTS
-0.14
atos
-0.14
nos
-0.14
Morris
-0.14
ibu
-0.14
esson
-0.14
POSITIVE LOGITS
éĸĢ
0.17
iki
0.17
acid
0.16
éħ¸
0.15
roman
0.15
Acid
0.15
λι
0.15
pun
0.15
acids
0.14
iyan
0.14
Activations Density 0.027%