INDEX
Explanations
words related to political figures and actions in a historical context
New Auto-Interp
Negative Logits
og
-0.18
pir
-0.15
vailability
-0.15
Sanayi
-0.15
odor
-0.15
och
-0.14
Phonetic
-0.14
ESCO
-0.14
Accordingly
-0.14
ÑĥÑĩа
-0.14
POSITIVE LOGITS
Det
0.27
Den
0.25
En
0.23
De
0.21
Sed
0.21
Inn
0.20
Det
0.20
Result
0.20
Om
0.20
Under
0.19
Activations Density 0.039%