INDEX
Explanations
wordplay
This neuron detects the kind of explanatory language used when pointing out puns or phonetic similarities (e.g. tokens like “sounds,” “similar,” and quoted word comparisons).
New Auto-Interp
Negative Logits
brane
-0.06
_mid
-0.06
242
-0.06
otate
-0.06
_management
-0.06
caffeine
-0.06
omial
-0.06
ad
-0.06
_ctr
-0.05
čil
-0.05
POSITIVE LOGITS
inclusive
0.07
BTN
0.07
diamond
0.07
اقتص
0.07
mari
0.06
alınan
0.06
rule
0.06
prostituerade
0.06
JEXEC
0.06
品
0.06
Activations Density 0.038%