INDEX
Explanations
references to high-ranking officials and government positions
New Auto-Interp
Negative Logits
INTERRUPTION
-0.16
ombres
-0.16
atan
-0.16
konkrét
-0.15
ráf
-0.15
anford
-0.15
bilt
-0.15
iVar
-0.14
æħ¶
-0.14
radu
-0.14
POSITIVE LOGITS
head
0.17
esson
0.16
oeff
0.15
chief
0.15
current
0.15
éģĵ
0.15
assi
0.15
lead
0.14
god
0.14
<$>
0.13
Activations Density 0.048%