INDEX
Explanations
negation words
The neuron activates on words that signal comparison or contrast (e.g. than, less, rather, only, not) or emphasize degree in describing trade-offs.
New Auto-Interp
Negative Logits
tart
-0.08
Nd
-0.07
SEM
-0.06
_types
-0.06
OCR
-0.06
Tricks
-0.06
stab
-0.06
Compile
-0.06
kart
-0.06
Rut
-0.06
POSITIVE LOGITS
倍
0.07
">--}}↵
0.07
hotelu
0.06
"}; ↵
0.06
quota
0.06
/system
0.06
안내
0.06
getLast
0.06
DISCLAIMER
0.06
Available
0.06
Activations Density 0.060%