INDEX
Explanations
mentions of specific names, particularly of individuals associated with specific events or organizations
New Auto-Interp
Negative Logits
âĸ¬âĸ¬
-1.08
EMENT
-1.08
IBLE
-1.02
âķIJâķIJ
-1.00
éĹĺ
-0.99
FANTASY
-0.99
LECT
-0.98
MENT
-0.93
BALL
-0.92
FOX
-0.91
POSITIVE LOGITS
aji
1.53
orne
1.35
osh
1.24
orns
1.20
arna
1.20
oul
1.18
alid
1.17
azard
1.16
ahah
1.16
ush
1.16
Activations Density 1.073%