INDEX
Explanations
quantitative comparisons or quantities
instances of the word "more" across various contexts
New Auto-Interp
Negative Logits
Prediction
-0.71
Everywhere
-0.67
wherever
-0.63
Thomson
-0.60
uca
-0.60
Eye
-0.60
OX
-0.58
Evening
-0.58
xtap
-0.57
selves
-0.57
POSITIVE LOGITS
than
1.39
than
1.13
mundane
1.02
stringent
1.02
recent
0.98
affluent
0.97
advanced
0.97
esoteric
0.94
sophisticated
0.93
obscure
0.91
Activations Density 0.092%