INDEX
Explanations
characters and their roles, focusing on their behaviors and relationships in storytelling
New Auto-Interp
Negative Logits
ilim
-0.17
.uni
-0.16
laughter
-0.16
arent
-0.15
ascar
-0.15
disrespect
-0.15
gnore
-0.15
-gnu
-0.14
iedo
-0.14
edback
-0.14
POSITIVE LOGITS
alo
0.19
shl
0.19
cipher
0.19
tac
0.18
bro
0.17
clue
0.17
su
0.16
earn
0.16
foil
0.16
lik
0.16
Activations Density 0.120%