INDEX
Explanations
phrases indicating a political context or position
New Auto-Interp
Negative Logits
Fare
-0.16
fare
-0.16
Expansion
-0.15
aro
-0.15
aru
-0.14
isay
-0.14
expansion
-0.14
olas
-0.13
él
-0.13
Mas
-0.13
POSITIVE LOGITS
kola
0.16
argar
0.15
azı
0.14
ë²Ķ
0.14
bergen
0.14
azzi
0.14
opsis
0.14
iben
0.14
gard
0.14
adesh
0.13
Activations Density 0.003%