INDEX
Explanations
references to famous fictional characters and figures
references to legendary or iconic characters from literature and film
New Auto-Interp
Negative Logits
VL
-0.75
ITCH
-0.72
UB
-0.71
TAM
-0.71
AMY
-0.68
hold
-0.68
together
-0.68
RU
-0.68
æĢ
-0.66
onder
-0.66
POSITIVE LOGITS
imperson
0.96
Mansion
0.81
hler
0.76
memor
0.75
spoof
0.74
imitation
0.74
parody
0.73
cartoons
0.72
movies
0.72
analogue
0.72
Activations Density 0.132%