INDEX
Explanations
references to nations, particularly in the context of government or geopolitical topics
New Auto-Interp
Negative Logits
Hra
-0.17
ved
-0.16
éŃļ
-0.15
yon
-0.15
Güven
-0.15
aru
-0.14
ivet
-0.14
seg
-0.14
idas
-0.14
çŃĴ
-0.14
POSITIVE LOGITS
enda
0.17
hod
0.15
abox
0.15
anoia
0.14
Fog
0.14
inth
0.14
çĽ
0.14
eyh
0.14
eper
0.14
diss
0.14
Activations Density 0.003%