INDEX
Explanations
code names
The neuron activates on placeholder model or project identifiers of the form “NAME_1,” i.e. tokens that form that placeholder name.
New Auto-Interp
Negative Logits
flawless
-0.06
ประเภท
-0.06
Hindered
-0.06
hatır
-0.06
Airport
-0.06
NotImplementedError
-0.06
_fence
-0.06
StatusCode
-0.06
ække
-0.06
★
-0.06
POSITIVE LOGITS
stab
0.07
Fancy
0.07
Alicia
0.06
ASA
0.06
fora
0.06
mia
0.06
wifi
0.06
ню
0.06
freopen
0.06
psz
0.06
Activations Density 0.049%