INDEX
Explanations
proper nouns
specific names or identifiers related to people or entities
New Auto-Interp
Negative Logits
Lead
-0.85
iron
-0.81
Iron
-0.81
iron
-0.79
Able
-0.75
Copper
-0.68
malaria
-0.67
arsenic
-0.66
Stick
-0.66
Solomon
-0.66
POSITIVE LOGITS
yy
4.16
YY
2.48
nn
1.82
eeee
1.70
NN
1.36
kk
1.36
aaaa
1.33
yah
1.30
hhhh
1.26
aaa
1.26
Activations Density 0.035%