INDEX
Explanations
This neuron is primarily triggered by the word “changes.”
New Auto-Interp
Negative Logits
fort
-0.08
Petroleum
-0.08
ppl
-0.07
prefab
-0.07
Elliot
-0.07
Elliott
-0.07
(ViewGroup
-0.06
Txt
-0.06
Roosevelt
-0.06
Eagles
-0.06
POSITIVE LOGITS
Changes
0.09
changes
0.09
Changes
0.08
change
0.08
as
0.07
стро
0.07
imize
0.07
-care
0.07
MS
0.07
行为
0.07
Activations Density 0.028%