INDEX
Explanations
phrases related to specific movies, especially those in a particular series
references to movie titles and their sequels
New Auto-Interp
Negative Logits
trough
-0.77
hole
-0.75
pockets
-0.73
misfortune
-0.72
negatively
-0.69
outp
-0.68
fue
-0.67
adversely
-0.67
worsen
-0.66
milit
-0.65
POSITIVE LOGITS
Detail
0.84
Intent
0.84
Machines
0.82
Colossus
0.80
Kings
0.80
Species
0.80
Childhood
0.79
Osiris
0.76
Cortex
0.75
Ancients
0.74
Activations Density 0.081%