INDEX
Explanations
names of actors and their roles in movies
New Auto-Interp
Negative Logits
ylko
-0.17
fone
-0.16
ulary
-0.15
ebo
-0.14
gota
-0.14
venir
-0.14
idon
-0.14
pliers
-0.14
adro
-0.14
ÃŁen
-0.13
POSITIVE LOGITS
cul
0.16
Gods
0.15
(“
0.15
Hav
0.14
Americas
0.14
leh
0.13
.UR
0.13
mar
0.13
erv
0.13
Trib
0.13
Activations Density 0.182%