INDEX
Explanations
The neuron fires on words and phrases used to judge factual consistency—e.g. “fact,” “factually,” “consistent,” and “inconsistent.”
phrases indicating the action of requesting or asking someone to do something.
New Auto-Interp
Negative Logits
.LA
-0.07
_y
-0.07
ρία
-0.07
pog
-0.06
eryl
-0.06
网址
-0.06
slam
-0.06
Commons
-0.06
Penalty
-0.06
goog
-0.06
POSITIVE LOGITS
marched
0.06
{lng0.06
nominated
0.06
ньої
0.06
าหาร
0.06
mạch
0.06
castle
0.06
wildfire
0.05
.scalajs
0.05
.series
0.05
Activations Density 0.020%