INDEX
Explanations
the word "element" with a high activation value
the concept of "element" in various contexts
New Auto-Interp
Negative Logits
sburgh
-0.96
lishing
-0.72
pr
-0.69
blems
-0.67
blem
-0.65
claimed
-0.65
ulative
-0.65
ply
-0.63
trans
-0.63
ever
-0.63
POSITIVE LOGITS
element
1.38
elements
1.17
Element
0.96
"$:/
0.93
icide
0.83
idable
0.82
abund
0.81
element
0.79
icides
0.74
osaurs
0.73
Activations Density 0.007%