INDEX
Explanations
references to individual identity and personal experiences.
gender-related discussions and possibly LGBTQ+ content.
The main thing this neuron does is find the beginning of first-person statements or observations, particularly those starting with "I" or "And I" or similar phrases like "it was" or "it's". It seems to activate strongly for the start of personal opinions, reflections, or
"I" followed by a verb
I followed by verbs
New Auto-Interp
Negative Logits
however
-0.95
however
-0.90
évaluateur
-0.79
echter
-0.77
However
-0.76
jednak
-0.70
však
-0.66
However
-0.66
porém
-0.65
però
-0.64
POSITIVE LOGITS
correspondingly
1.11
accordingly
1.10
consequently
1.05
何より
1.05
consequent
0.91
derfor
0.87
frankly
0.87
certainly
0.85
dermed
0.85
соответственно
0.84
Activations Density 0.235%