INDEX
Explanations
instances where a preferred action or choice is made over an alternative
New Auto-Interp
Negative Logits
landmark
-0.69
Previously
-0.64
milestone
-0.61
particularly
-0.60
estamp
-0.60
anta
-0.60
anniversary
-0.59
exceeds
-0.58
mentioned
-0.57
Earlier
-0.57
POSITIVE LOGITS
merely
1.19
concentrate
1.08
simply
1.00
Instead
0.90
purely
0.89
relying
0.86
foc
0.86
instead
0.85
bland
0.81
focus
0.81
Activations Density 3.978%