INDEX
Explanations
words related to medical conditions
instances of the word "ill."
New Auto-Interp
Negative Logits
Snap
-0.73
beta
-0.67
SNAP
-0.66
gamma
-0.66
Gamma
-0.64
Zhou
-0.63
Schwarz
-0.62
ABE
-0.61
adopted
-0.60
Subway
-0.60
POSITIVE LOGITS
ill
4.37
ills
2.46
illation
2.28
illed
2.16
ILL
2.14
illing
2.12
illian
2.01
illi
2.01
iller
1.96
illin
1.95
Activations Density 0.008%