INDEX
Explanations
conversational exchanges and expressions of emotion
New Auto-Interp
Negative Logits
ilde
-0.18
obao
-0.15
Fucking
-0.15
shall
-0.14
ihad
-0.14
adÃŃ
-0.14
imas
-0.14
usty
-0.14
£o
-0.14
freaking
-0.13
POSITIVE LOGITS
’
0.22
dat
0.19
akin
0.19
kin
0.19
git
0.19
izin
0.19
dere
0.19
sich
0.18
kinda
0.18
è¾°
0.18
Activations Density 0.353%