INDEX
Explanations
phrases describing initial actions, thoughts, or observations
references to the concept of "first impressions" or initial reactions
New Auto-Interp
Negative Logits
vind
-0.70
yet
-0.68
still
-0.67
aden
-0.66
recy
-0.63
RW
-0.63
GF
-0.61
lov
-0.61
claimer
-0.60
Canad
-0.60
POSITIVE LOGITS
responders
0.83
noticed
0.82
visitors
0.72
encountered
0.71
asma
0.67
inevitably
0.67
introdu
0.67
alerted
0.66
noticing
0.66
reaction
0.66
Activations Density 0.086%