INDEX
Explanations
it seems to be detecting specific phrases or keywords, but based solely on the given activations provided, it's not clear what the neuron is specifically looking for
mentions of economic concepts or factors
New Auto-Interp
Negative Logits
subpoena
-0.86
redacted
-0.79
Hayden
-0.79
warrant
-0.77
Adams
-0.76
Rutherford
-0.76
subpoen
-0.75
Belichick
-0.74
Clapper
-0.73
Cheney
-0.73
POSITIVE LOGITS
Vill
1.31
Residents
1.26
Rohing
1.16
villagers
1.16
Tour
1.12
Girls
1.10
Farm
1.07
Women
1.07
obyl
1.06
India
1.05
Activations Density 0.589%