INDEX
Explanations
entities related to health and medical conditions
instances of proper nouns, particularly names and specific terms
New Auto-Interp
Negative Logits
ed
-0.70
silenced
-0.68
e
-0.65
staking
-0.65
dar
-0.65
condensed
-0.62
LY
-0.60
ĪĴ
-0.59
Madison
-0.59
Bundy
-0.59
POSITIVE LOGITS
atform
1.18
opl
1.13
asts
1.10
asso
1.04
asms
1.01
ifting
1.00
thora
0.95
ases
0.95
ifts
0.95
uten
0.92
Activations Density 0.015%