INDEX
Explanations
This neuron detects disclaimer language stating (non-)affiliation or endorsement (e.g., “not affiliated,” “endorsed by,” “authorized,” etc.).
New Auto-Interp
Negative Logits
inary
-0.08
metric
-0.07
.tabs
-0.07
=""
-0.06
subscription
-0.06
술
-0.06
.pipe
-0.06
alimentos
-0.06
其中
-0.06
気が
-0.06
POSITIVE LOGITS
frm
0.07
EVT
0.07
fen
0.07
Ib
0.07
systemd
0.07
۱۹۴
0.06
ЛИ
0.06
.FAIL
0.06
qt
0.06
_IV
0.06
Activations Density 0.003%