INDEX
Explanations
research
This neuron detects hedging or implication language—phrases that qualify results and call for further study, validation, or clarification.
New Auto-Interp
Negative Logits
ека
-0.06
'я
-0.06
…ط
-0.06
biggest
-0.06
inhibited
-0.06
CD
-0.06
E
-0.06
Joshua
-0.06
п
-0.06
canyon
-0.06
POSITIVE LOGITS
δί
0.07
атем
0.07
하는데
0.06
lod
0.06
うち
0.06
textView
0.06
£
0.06
aget
0.06
scrollView
0.06
_VOID
0.06
Activations Density 0.146%