INDEX
Explanations
sentences related to counting or measuring
New Auto-Interp
Negative Logits
ieties
-0.66
Oracle
-0.65
conditioning
-0.64
obser
-0.63
Harbor
-0.62
Bree
-0.62
waivers
-0.61
insula
-0.61
Centauri
-0.60
habit
-0.59
POSITIVE LOGITS
enance
1.78
downs
0.89
esses
0.87
rified
0.86
ess
0.82
ries
0.77
icates
0.77
ensen
0.76
icated
0.74
eenth
0.74
Activations Density 0.629%