INDEX
Explanations
Do note that the neuron 4 activates particularly when it encounters the word "quirk" in various contexts
words related to quirks or oddities, particularly in reference to people or behavior
New Auto-Interp
Negative Logits
hur
-0.68
rade
-0.64
iam
-0.63
amaz
-0.62
ÙĴ
-0.62
gang
-0.62
sword
-0.60
âĺħâĺħ
-0.59
++++
-0.59
belt
-0.59
POSITIVE LOGITS
conservancy
0.87
earance
0.82
enhagen
0.81
atcher
0.78
utic
0.75
adem
0.73
opsis
0.69
ensable
0.69
lihood
0.67
Cipher
0.67
Activations Density 0.050%