INDEX
Explanations
unstructured data
This neuron responds to tokens naming or indicating content‐safety categories (e.g. “sexual,” “violence,” “self‐harm,” “narcotics,” etc.).
New Auto-Interp
Negative Logits
租
-0.06
_users
-0.06
FOR
-0.06
币
-0.06
'D
-0.06
kutje
-0.06
Bilg
-0.06
Ign
-0.06
ACCOUNT
-0.05
회사
-0.05
POSITIVE LOGITS
Mezi
0.07
jednoho
0.06
inely
0.06
she
0.06
.total
0.06
angi
0.06
_destroy
0.06
_EXTENDED
0.06
ruce
0.06
rious
0.06
Activations Density 0.010%