INDEX
Explanations
common articles like "a" or "an"
articles and quantifiers preceding nouns
New Auto-Interp
Negative Logits
bis
-0.74
anism
-0.68
Anim
-0.66
onlook
-0.65
assisted
-0.64
inel
-0.64
ATS
-0.62
OWS
-0.61
ercise
-0.61
arten
-0.59
POSITIVE LOGITS
knack
1.38
tendency
1.35
penchant
1.19
reputation
1.18
propensity
1.16
chance
1.05
vested
1.04
lot
0.95
plethora
0.92
tremendous
0.91
Activations Density 0.133%