INDEX
Explanations
Repetitive/Nonsensical Text
The neuron strongly activates on the word “hostility.”
New Auto-Interp
Negative Logits
Knight
-0.07
Knight
-0.07
_MANY
-0.06
-phone
-0.06
propName
-0.06
_pipeline
-0.06
ры
-0.06
plist
-0.06
.tf
-0.06
metrics
-0.06
POSITIVE LOGITS
(Py
0.07
�
0.06
ควร
0.06
شمالی
0.06
respectfully
0.06
quoise
0.06
.wp
0.06
.ForeColor
0.06
Gameplay
0.06
Hercules
0.06
Activations Density 0.013%