INDEX
Explanations
The neuron activates on words and word pieces referring to sequels, spin-offs, or follow-up productions.
New Auto-Interp
Negative Logits
invers
-0.07
Επι
-0.07
симв
-0.06
القرن
-0.06
whoever
-0.06
inducing
-0.06
FirstChild
-0.06
detecting
-0.06
broaden
-0.06
ética
-0.06
POSITIVE LOGITS
ł
0.07
anical
0.07
Makeup
0.06
삶
0.06
deviceId
0.06
-Col
0.06
bm
0.06
・
0.06
telesc
0.06
alus
0.06
Activations Density 0.027%