INDEX
Explanations
discovery, explanation
references to modern technological innovations or inventions.
This neuron responds to mentions of humans or the word “human.”
New Auto-Interp
Negative Logits
IF
-0.07
илися
-0.06
Kot
-0.06
ring
-0.06
Y
-0.06
rejected
-0.06
gaming
-0.06
prer
-0.06
_PAGE
-0.06
applicants
-0.06
POSITIVE LOGITS
λοι
0.07
ItemStack
0.07
Nova
0.07
hôm
0.06
avons
0.06
Αγ
0.06
%+
0.06
航空
0.06
_INTERVAL
0.06
gratis
0.06
Activations Density 0.036%