INDEX
Explanations
strength
The neuron identifies words referring to structural strength or integrity.
New Auto-Interp
Negative Logits
txn
-0.07
करन
-0.07
atology
-0.06
categorie
-0.06
حن
-0.06
_cleanup
-0.06
幸福
-0.06
해결
-0.06
velocities
-0.06
southwestern
-0.06
POSITIVE LOGITS
Bölüm
0.08
Serge
0.07
mpjes
0.07
หลวง
0.07
_Cl
0.06
mastering
0.06
eview
0.06
hil
0.06
olum
0.06
ingle
0.06
Activations Density 0.038%