INDEX
Explanations
dialogue and conversational responses in the text
New Auto-Interp
Negative Logits
Gew
-0.16
Gamb
-0.15
bye
-0.15
DDL
-0.15
onica
-0.14
ifestyles
-0.14
rozen
-0.14
zych
-0.14
odied
-0.14
PLICATION
-0.14
POSITIVE LOGITS
anky
0.17
alker
0.17
ellar
0.14
avy
0.14
ella
0.14
emin
0.14
æģ¯
0.13
大åħ¨
0.13
endi
0.13
ãİ
0.13
Activations Density 0.281%