INDEX
Explanations
instances of actors or characters being referred to or described in movies or shows
New Auto-Interp
Negative Logits
pte
-0.07
.rd
-0.07
otre
-0.07
etur
-0.06
é«
-0.06
agal
-0.06
amework
-0.06
ponge
-0.06
xon
-0.06
ete
-0.06
POSITIVE LOGITS
iten
0.07
alsy
0.07
387
0.06
incre
0.06
204
0.06
incr
0.06
ITH
0.06
elligence
0.06
UDA
0.06
arded
0.06
Activations Density 0.002%