INDEX
Explanations
queries seeking clarification or definitions of terms and concepts
New Auto-Interp
Negative Logits
only
-0.91
nowhere
-0.87
only
-0.82
Only
-0.77
Only
-0.73
never
-0.71
apenas
-0.70
never
-0.70
nothing
-0.70
seldom
-0.68
POSITIVE LOGITS
exactly
1.49
exactly
1.29
exactamente
1.25
exactement
1.24
Exactly
1.17
precies
1.11
actually
1.09
esattamente
1.06
EXACTLY
1.04
ACTUALLY
1.04
Activations Density 0.559%