INDEX
Explanations
common english phrases
The neuron is detecting common high‐frequency function words (e.g. “and,” “of,” “for,” etc.).
New Auto-Interp
Negative Logits
Fn
-0.07
agem
-0.07
remorse
-0.06
Harr
-0.06
MM
-0.06
_PA
-0.06
kas
-0.06
sins
-0.06
LT
-0.06
ね
-0.06
POSITIVE LOGITS
همچنین
0.07
توسعه
0.07
Francie
0.06
[]{"0.06
истра
0.06
ъек
0.06
.AllowGet
0.06
меди
0.06
Obviously
0.06
.signIn
0.06
Activations Density 0.325%