INDEX
Explanations
expressions of opinion and belief related to various topics
New Auto-Interp
Negative Logits
ÅĪ
-0.17
Vill
-0.17
Duy
-0.16
instein
-0.15
elda
-0.15
orp
-0.15
cycles
-0.15
xo
-0.15
wan
-0.14
apa
-0.14
POSITIVE LOGITS
vore
0.18
chluss
0.15
erdem
0.15
Sez
0.14
adaki
0.14
228
0.14
Kurd
0.13
Haven
0.13
yses
0.13
aton
0.13
Activations Density 0.110%