INDEX
Explanations
references to television shows and their characters
New Auto-Interp
Negative Logits
Yao
-0.18
Dao
-0.17
Kos
-0.16
Erdogan
-0.15
Panda
-0.15
panda
-0.15
Sok
-0.15
TOK
-0.15
Yuri
-0.15
kos
-0.14
POSITIVE LOGITS
Psycho
0.37
Norman
0.35
Bates
0.35
Psy
0.33
Hitch
0.32
Norm
0.30
Norm
0.30
Marion
0.27
Perkins
0.27
psycho
0.26
Activations Density 0.006%