INDEX
Explanations
elements related to character choices and transformations in narratives
New Auto-Interp
Negative Logits
ape
-0.16
iform
-0.15
IRC
-0.15
ourcem
-0.14
ipers
-0.14
urban
-0.14
Circus
-0.14
|--
-0.14
orgia
-0.14
ung
-0.14
POSITIVE LOGITS
Frozen
0.33
Elsa
0.31
Frozen
0.28
Anna
0.21
frozen
0.21
Disney
0.20
Rap
0.20
Anna
0.20
snow
0.19
겨
0.19
Activations Density 0.007%