INDEX
Explanations
The neuron activates on explanations or definitions of “token” (and related terms like subword, unit of text, tokenization).
New Auto-Interp
Negative Logits
Pols
-0.08
чис
-0.07
hypoth
-0.06
_ter
-0.06
Hel
-0.06
सत
-0.06
anax
-0.06
वत
-0.06
XY
-0.06
ův
-0.06
POSITIVE LOGITS
selector
0.07
每
0.06
[(
0.06
перемен
0.06
indemn
0.06
<!
0.06
.gsub
0.06
PerPixel
0.06
WideString
0.06
配置
0.06
Activations Density 0.007%