INDEX
Explanations
The neuron fires whenever the standalone word “major” (in any capitalization) appears, especially as a heading or key descriptor.
New Auto-Interp
Negative Logits
ินท
-0.07
nowledge
-0.06
weet
-0.06
بتن
-0.06
уп
-0.06
BED
-0.06
ñas
-0.06
seedu
-0.06
уются
-0.06
ETweet
-0.06
POSITIVE LOGITS
major
0.15
Major
0.13
Major
0.11
major
0.10
majors
0.09
Maj
0.09
jur
0.08
MR
0.08
AJ
0.08
RR
0.08
Activations Density 0.018%