INDEX
Explanations
references to emotional responses and personal concerns
New Auto-Interp
Negative Logits
/
-0.38
auto
-0.32
to
-0.30
ings
-0.29
od
-0.29
Cowan
-0.29
screen
-0.28
'
-0.28
via
-0.28
auto
-0.28
POSITIVE LOGITS
considerably
0.99
greatly
0.97
significantly
0.96
immensely
0.89
slightly
0.88
slightly
0.87
appreciably
0.84
massively
0.83
tremendously
0.82
substantially
0.82
Activations Density 0.381%