INDEX
Explanations
names of directors and films
New Auto-Interp
Negative Logits
-0.32
S
-0.28
g
-0.27
c
-0.27
A
-0.26
W
-0.26
w
-0.26
"
-0.26
C
-0.26
(
-0.26
POSITIVE LOGITS
ÃŃ
0.30
á
0.29
ý
0.28
ÃŃn
0.27
ÃŃž
0.25
ÄĽ
0.25
ÃŃd
0.25
ÃŃr
0.25
ů
0.24
ÃŃÅĻ
0.24
Activations Density 0.026%