INDEX
Explanations
Introduction
The neuron activates specifically on the document’s “Introduction” section heading.
New Auto-Interp
Negative Logits
$t
-0.07
YYYY
-0.07
якій
-0.07
梨
-0.07
:id
-0.06
Port
-0.06
Remix
-0.06
Survey
-0.06
staining
-0.06
บอล
-0.06
POSITIVE LOGITS
?'↵↵
0.07
نامج
0.06
юрид
0.06
็ว
0.06
نسب
0.06
quantidade
0.06
misrepresented
0.06
regulates
0.06
ững
0.06
چند
0.06
Activations Density 0.009%