INDEX
Explanations
phrases related to conditions or consequences of certain actions or beliefs
instances of the word "don't" or its variations
New Auto-Interp
Negative Logits
Site
-0.71
Alleg
-0.68
Balanced
-0.67
OSP
-0.67
Policies
-0.64
Starts
-0.61
Gall
-0.61
Strategy
-0.60
Palest
-0.60
inia
-0.59
POSITIVE LOGITS
bother
0.94
succeed
0.94
theless
0.85
urtles
0.84
comply
0.82
appreciate
0.81
recognize
0.81
realize
0.80
necessarily
0.79
realise
0.79
Activations Density 0.083%