INDEX
Explanations
titles and references to movies
references to movies
New Auto-Interp
Negative Logits
raham
-0.71
urst
-0.68
²¾
-0.67
ilities
-0.67
utter
-0.66
existent
-0.66
bleacher
-0.66
Haitian
-0.66
ĵĺ
-0.66
İĭ
-0.65
POSITIVE LOGITS
theaters
1.08
theater
1.06
goers
1.04
movies
1.03
movie
0.96
theatre
0.96
Movies
0.89
theat
0.87
buffs
0.86
eers
0.86
Activations Density 0.025%