INDEX
Explanations
references to "Star Wars" in various contexts
New Auto-Interp
Negative Logits
elho
-0.18
enne
-0.17
erson
-0.17
kaz
-0.17
ymous
-0.17
yses
-0.17
go
-0.16
estro
-0.15
abler
-0.15
گاÙĩ
-0.15
POSITIVE LOGITS
Wars
0.35
Trek
0.29
wars
0.28
Wars
0.28
ship
0.25
ry
0.22
vation
0.21
trek
0.21
ships
0.20
ategy
0.20
Activations Density 0.010%