INDEX
Explanations
"is" and "has" (verbs)
The neuron activates on occurrences of “have not been verified” (i.e. negative perfect‐passive phrases indicating unverified information).
New Auto-Interp
Negative Logits
Produces
-0.06
Process
-0.06
odon
-0.06
.effect
-0.06
estion
-0.06
Voter
-0.06
Expansion
-0.06
铁
-0.06
Frankfurt
-0.05
ektor
-0.05
POSITIVE LOGITS
Dave
0.07
priceless
0.07
声明
0.07
mute
0.07
кг
0.07
inheritDoc
0.07
arem
0.07
><?
0.06
overview
0.06
[_
0.06
Activations Density 0.001%