INDEX
Explanations
phrases related to political titles and affiliations
New Auto-Interp
Negative Logits
oples
-0.15
خص
-0.15
.fhir
-0.15
央
-0.15
Mixin
-0.14
μο
-0.14
جÙĦ
-0.14
icks
-0.14
lanan
-0.14
oftware
-0.14
POSITIVE LOGITS
airo
0.16
erras
0.16
··
0.15
šek
0.15
VECTOR
0.15
ade
0.14
ãĤ¦ãĤ§
0.14
metab
0.14
vak
0.14
atz
0.14
Activations Density 0.013%