INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ema
-0.85
eco
-0.82
baugh
-0.82
raw
-0.81
aan
-0.79
rawl
-0.79
ongo
-0.76
ugg
-0.73
Crew
-0.72
undown
-0.72
POSITIVE LOGITS
Glas
0.71
Roose
0.70
ampl
0.70
Huck
0.68
magn
0.66
Democr
0.63
beams
0.63
Amen
0.63
Carnegie
0.63
respons
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.