INDEX
Explanations
introductory sentences or sections
New Auto-Interp
Negative Logits
tics
-0.71
Canaver
-0.71
sung
-0.67
Gould
-0.63
vor
-0.61
roller
-0.60
sav
-0.60
rolet
-0.59
aths
-0.59
cling
-0.59
POSITIVE LOGITS
responders
1.33
baseman
1.18
impressions
0.96
glance
0.95
blush
0.93
foray
0.89
born
0.84
impression
0.81
lady
0.81
glimpse
0.80
Activations Density 1.924%