INDEX
Explanations
references to specific movies or franchises
New Auto-Interp
Negative Logits
mouth
-0.87
mel
-0.79
onga
-0.72
strate
-0.65
inker
-0.65
inges
-0.64
phys
-0.64
gling
-0.64
istics
-0.64
ogly
-0.64
POSITIVE LOGITS
Awakens
0.74
ÃįÃį
0.73
âĦ¢:
0.72
hler
0.70
Episode
0.70
Skywalker
0.70
itialized
0.68
Galaxy
0.66
film
0.65
galaxy
0.64
Activations Density 0.010%