INDEX
Explanations
positive responses or agreements
affirmative responses to questions or statements
New Auto-Interp
Negative Logits
sing
-0.77
drawn
-0.66
abal
-0.62
reverted
-0.61
ded
-0.59
ylan
-0.58
mourning
-0.56
sling
-0.56
bearer
-0.56
ined
-0.55
POSITIVE LOGITS
terday
1.03
sir
0.98
Absolutely
0.79
YES
0.76
Absolutely
0.76
Answer
0.75
yes
0.74
Nope
0.72
!,
0.72
yes
0.70
Activations Density 0.111%