INDEX
Explanations
The neuron seems to activate for sentences starting with "I know I", with varying levels of certainty and emotional tones
phrases expressing personal knowledge or experiences
New Auto-Interp
Negative Logits
BuyableInstoreAndOnline
-0.74
etheus
-0.63
Âł Âł
-0.60
Cancel
-0.60
opes
-0.59
rex
-0.58
resume
-0.58
anship
-0.57
arantine
-0.57
eworthy
-0.56
POSITIVE LOGITS
)</
0.72
sb
0.71
PLA
0.70
firsthand
0.67
CLA
0.64
stadt
0.63
plenty
0.61
anza
0.61
ById
0.60
%);
0.59
Activations Density 0.271%