INDEX
Explanations
words related to physical objects and actions happening to them
specific nouns and technical terms related to mechanisms and structures
New Auto-Interp
Negative Logits
Ranked
-0.75
;;;;
-0.74
Helpful
-0.67
Interested
-0.67
Know
-0.66
ï¸
-0.65
Own
-0.65
ECA
-0.62
Ranked
-0.62
edIn
-0.62
POSITIVE LOGITS
disappears
1.08
iest
0.93
becomes
0.93
osphere
0.87
ceases
0.85
explodes
0.84
goes
0.84
portion
0.83
fades
0.83
wasn
0.83
Activations Density 0.664%