INDEX
Explanations
phrases related to political positions and stances
New Auto-Interp
Negative Logits
ennen
-0.16
опÑĢи
-0.16
ontent
-0.15
ONT
-0.15
Å©
-0.14
ONGO
-0.14
oufl
-0.14
ãĤĩ
-0.14
··
-0.13
à¹ĩà¸ķาม
-0.13
POSITIVE LOGITS
on
0.77
trên
0.44
عÙĦÙī
0.41
на
0.41
auf
0.37
på
0.36
regarding
0.35
pada
0.33
on
0.33
on
0.32
Activations Density 0.148%