INDEX
Explanations
elements related to characters and their actions in films
New Auto-Interp
Negative Logits
_mirror
-0.14
rip
-0.14
pora
-0.14
hte
-0.14
presso
-0.14
rips
-0.13
ÑĥмÑĥ
-0.13
ستر
-0.13
Slave
-0.13
clist
-0.13
POSITIVE LOGITS
handling
0.15
tal
0.15
ì½ľ
0.14
алÑĸв
0.14
hab
0.14
atoria
0.14
orp
0.14
297
0.14
467
0.14
oni
0.13
Activations Density 0.069%