INDEX
Explanations
sentences or phrases that end with a period followed by a numerical value
instances of statistical claims or statements about events
New Auto-Interp
Negative Logits
tremend
-0.96
metic
-0.77
charism
-0.73
derog
-0.72
emot
-0.69
bryce
-0.69
perspect
-0.68
skelet
-0.68
simultane
-0.67
princ
-0.67
POSITIVE LOGITS
Months
0.93
Thousands
0.92
Yet
0.89
But
0.88
Thousands
0.88
Instead
0.88
But
0.87
↵
0.87
Few
0.86
Newly
0.85
Activations Density 0.216%