INDEX
Explanations
phrases related to criticizing actions or statements
references to accountability or consequences associated with actions or behaviors
New Auto-Interp
Negative Logits
wolves
-0.70
sources
-0.68
alternatives
-0.67
FIELD
-0.64
fellows
-0.62
ods
-0.61
andals
-0.61
kees
-0.60
ergic
-0.58
Options
-0.58
POSITIVE LOGITS
every
1.91
every
1.84
Every
1.68
Every
1.67
EVERY
1.55
each
1.22
Each
1.18
each
1.17
Each
1.13
everybody
1.01
Activations Density 0.275%