INDEX
Explanations
phrases indicating future events or releases
occurrences of the word "coming."
New Auto-Interp
Negative Logits
ording
-0.71
sav
-0.68
aundering
-0.67
ting
-0.67
ishes
-0.66
cius
-0.66
tops
-0.65
claimer
-0.65
picking
-0.64
fur
-0.64
POSITIVE LOGITS
undone
0.99
Soon
0.90
apart
0.89
attractions
0.85
together
0.81
forward
0.79
closer
0.78
up
0.77
forth
0.77
Soon
0.77
Activations Density 0.041%