INDEX
Explanations
words and phrases related to counting and numerical descriptions
New Auto-Interp
Negative Logits
obser
-0.71
misunder
-0.69
Harbor
-0.67
conditioning
-0.63
constitu
-0.62
insula
-0.61
educ
-0.60
Goods
-0.59
SPONSORED
-0.59
appre
-0.59
POSITIVE LOGITS
enance
1.82
downs
0.92
oleon
0.78
calories
0.76
ENCY
0.76
rified
0.75
down
0.73
omp
0.70
rows
0.69
antine
0.68
Activations Density 0.011%