INDEX
Explanations
The neuron activates on mentions of academic honors and distinctions (e.g. “Fellow,” “Prize,” “Award”).
New Auto-Interp
Negative Logits
boxes
-0.06
icester
-0.06
HK
-0.06
atar
-0.06
Mur
-0.06
أحمد
-0.06
alarmed
-0.06
bitmap
-0.06
sky
-0.06
Mad
-0.06
POSITIVE LOGITS
fellowship
0.11
Fellowship
0.11
Fellow
0.09
Fell
0.09
////////////////////////////////////////////////////////////////////
0.08
haps
0.07
.Patient
0.07
fell
0.07
ellow
0.07
Stevenson
0.07
Activations Density 0.002%