INDEX
Explanations
This neuron detects references to collective entities or societal groups, such as “people,” “government,” or similar terms.
New Auto-Interp
Negative Logits
car
-0.07
兼
-0.07
콜걸
-0.07
افزایش
-0.07
surrogate
-0.07
Marco
-0.06
dinosaur
-0.06
.oper
-0.06
itch
-0.06
irates
-0.06
POSITIVE LOGITS
People
0.09
PEOPLE
0.09
peoples
0.08
人民
0.08
Messaging
0.07
hep
0.07
たちの
0.07
vulgar
0.07
Lebanese
0.06
halk
0.06
Activations Density 0.016%