INDEX
Explanations
mentions of the "Star Wars" franchise
New Auto-Interp
Negative Logits
erson
-0.17
andum
-0.17
adesh
-0.17
yses
-0.16
enne
-0.16
gens
-0.16
kaz
-0.15
iser
-0.15
ÑĨÑĮ
-0.15
eday
-0.15
POSITIVE LOGITS
Wars
0.29
ry
0.27
Trek
0.25
wars
0.22
ship
0.22
vation
0.22
bucks
0.21
Wars
0.21
ling
0.20
burst
0.20
Activations Density 0.012%