INDEX
Explanations
expressions of apology or regret
New Auto-Interp
Negative Logits
alo
-0.15
alli
-0.15
possibilities
-0.15
irk
-0.15
elib
-0.14
ini
-0.14
ets
-0.14
ali
-0.14
stadt
-0.14
extrav
-0.13
POSITIVE LOGITS
kus
0.19
813
0.19
about
0.16
apat
0.16
meant
0.15
éĮĦ
0.15
/not
0.15
isser
0.15
ably
0.15
for
0.15
Activations Density 0.021%