INDEX
Explanations
numbers in a specific range
phrases indicating a quantity or number of entities
New Auto-Interp
Negative Logits
bet
-0.63
fert
-0.61
uala
-0.59
mixture
-0.59
Effects
-0.58
blend
-0.58
hunt
-0.58
sensit
-0.57
kale
-0.55
agher
-0.55
POSITIVE LOGITS
uits
0.82
ources
0.74
dozen
0.70
ij士
0.68
icial
0.64
presidents
0.63
course
0.62
majors
0.61
consecutive
0.61
neys
0.61
Activations Density 0.087%