INDEX
Explanations
statements where a definitive answer is provided
statements emphasizing definitive conclusions or answers
New Auto-Interp
Negative Logits
eatures
-0.71
rongh
-0.70
schild
-0.64
icipated
-0.62
capacities
-0.62
vre
-0.61
allowed
-0.60
ivities
-0.60
astical
-0.60
activities
-0.59
POSITIVE LOGITS
YES
1.14
yes
1.14
YES
0.88
unequiv
0.87
nil
0.87
yes
0.85
affirmative
0.83
NO
0.81
Nope
0.79
nowhere
0.78
Activations Density 0.101%