INDEX
Explanations
references to national or political structures and their characteristics
New Auto-Interp
Negative Logits
isches
-0.18
licher
-0.18
liches
-0.17
آخر
-0.17
erto
-0.17
eres
-0.16
quello
-0.15
Ñıкий
-0.15
.mj
-0.15
ngen
-0.15
POSITIVE LOGITS
“She
0.26
"She
0.22
elige
0.21
neue
0.20
erste
0.19
lige
0.19
ganze
0.19
groÃŁe
0.19
française
0.18
andere
0.18
Activations Density 0.056%