INDEX
Explanations
words related to movie titles or film-related terms
mentions of specific geographic locations or countries
New Auto-Interp
Negative Logits
tnc
-0.72
edIn
-0.70
helm
-0.68
blu
-0.63
Strat
-0.63
Reson
-0.63
Thur
-0.62
pretext
-0.61
opio
-0.61
Heads
-0.60
POSITIVE LOGITS
pad
0.84
abad
0.83
mans
0.82
tub
0.79
afia
0.79
rost
0.76
edition
0.75
icz
0.75
pour
0.73
trump
0.71
Activations Density 0.000%