INDEX
Explanations
references to movie genres, particularly those related to action and adventure
New Auto-Interp
Negative Logits
èĮĤ
-0.15
ucci
-0.14
uta
-0.14
ft
-0.14
äm
-0.14
Reputation
-0.14
assel
-0.14
วà¸ĩ
-0.13
imoto
-0.13
soever
-0.13
POSITIVE LOGITS
íĸ¥
0.16
.ejb
0.16
rido
0.15
iyet
0.15
mens
0.15
Jennings
0.15
Neighbors
0.14
Ý
0.14
CÆ¡
0.14
inho
0.14
Activations Density 0.002%