INDEX
Explanations
phrases where an action is required as a first step
instances of the word "first"
New Auto-Interp
Negative Logits
Gould
-0.69
holes
-0.66
ourge
-0.66
Canaver
-0.65
oute
-0.61
ucl
-0.61
FORMATION
-0.60
morph
-0.60
sav
-0.57
Tant
-0.57
POSITIVE LOGITS
responders
1.12
baseman
1.07
glance
0.81
impressions
0.80
ancest
0.80
blush
0.79
lady
0.75
impression
0.73
accuser
0.73
glim
0.72
Activations Density 0.064%