INDEX
Explanations
references to popular movie titles and characters, particularly those associated with the "Back to the Future" and "Indiana Jones" franchises
New Auto-Interp
Negative Logits
raith
-0.16
2
-0.15
191
-0.15
CSA
-0.14
umar
-0.14
hop
-0.14
convention
-0.14
rita
-0.14
Celt
-0.14
1
-0.14
POSITIVE LOGITS
ysl
0.16
rzy
0.15
orderby
0.15
odense
0.15
ffer
0.15
APPLE
0.14
istrovstvÃŃ
0.14
Nimbus
0.14
::::::::::::::
0.14
#af
0.14
Activations Density 0.037%