INDEX
Explanations
The neuron fires on text belonging to the comments or moderation policy sections of an article.
New Auto-Interp
Negative Logits
_cell
-0.07
Future
-0.07
<Person
-0.07
menu
-0.06
228
-0.06
CAM
-0.06
icip
-0.06
Style
-0.06
动物
-0.06
dan
-0.06
POSITIVE LOGITS
червня
0.07
ibilidad
0.07
uber
0.07
gezocht
0.06
arenas
0.06
stretch
0.06
NECT
0.06
ambiance
0.06
ीत
0.06
estring
0.06
Activations Density 0.008%