INDEX
Explanations
negative and critical phrases
phrases indicating uncertainty or conditionality
New Auto-Interp
Negative Logits
Address
-0.75
address
-0.74
Division
-0.74
margin
-0.71
rotor
-0.71
avenue
-0.71
entry
-0.70
Tier
-0.69
Union
-0.69
ratification
-0.69
POSITIVE LOGITS
your
1.43
same
1.37
every
1.36
shit
1.32
everything
1.32
another
1.29
enough
1.27
the
1.27
anything
1.26
nature
1.25
Activations Density 0.097%