INDEX
Explanations
the word "ga" appearing at different levels of activation
repeated mentions of the word "ga"
New Auto-Interp
Negative Logits
chard
-0.79
tz
-0.79
itaire
-0.72
icable
-0.70
Cosponsors
-0.69
ecause
-0.69
sbm
-0.69
itarian
-0.68
ality
-0.67
alities
-0.67
POSITIVE LOGITS
terday
0.98
ignt
0.72
enaries
0.71
Nieto
0.67
indu
0.64
arde
0.64
andise
0.61
Varg
0.61
veyard
0.61
udo
0.60
Activations Density 0.041%