INDEX
Explanations
email addresses
The neuron activates on tokens that look like the dot‐separated local‐part of email addresses (e.g. name.initial).
New Auto-Interp
Negative Logits
δος
-0.07
Tôi
-0.07
ord
-0.07
Drop
-0.07
arter
-0.06
Vir
-0.06
Welcome
-0.06
#for
-0.06
リア
-0.06
tableView
-0.06
POSITIVE LOGITS
",");↵
0.07
.="<
0.07
이가
0.07
ελλην
0.06
.readline
0.06
كو
0.06
söylem
0.06
.amazon
0.06
��
0.06
sonucunda
0.06
Activations Density 0.004%