INDEX
Explanations
mentions of specific films, characters, and their creators
New Auto-Interp
Negative Logits
âĢIJ
-0.14
'/')↵
-0.14
´s
-0.14
"/";↵
-0.13
../../../
-0.13
иÑģлов
-0.13
ÙĪÛĮÚĺ
-0.13
__[
-0.13
''''
-0.13
ียà¸Ļ
-0.13
POSITIVE LOGITS
"
0.62
'
0.57
“
0.50
«
0.49
‘
0.42
ãĢĮ
0.38
("0.37
`
0.37
\"
0.35
ãĢĮ
0.34
Activations Density 0.712%