INDEX
Explanations
references to individuals and their affiliations
This neuron activates strongly on proper nouns and place names, particularly those that appear in citations, acknowledgments, and author attributions.
New Auto-Interp
Negative Logits
שוליים
-0.67
snippetHide
-0.67
GenerationType
-0.63
MessageOf
-0.63
enderror
-0.60
EconPapers
-0.60
exitRule
-0.58
addCriterion
-0.57
ब्रेकडाउन
-0.57
verwijspagina
-0.55
POSITIVE LOGITS
opinion
0.37
Strö
0.35
opin
0.35
Mind
0.35
belo
0.34
%^
0.34
ele
0.33
Mind
0.33
Todes
0.33
gros
0.32
Activations Density 0.312%