INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ogyn
-0.79
abilia
-0.76
ugu
-0.74
elling
-0.74
sted
-0.74
tainment
-0.73
soever
-0.70
antry
-0.68
rican
-0.67
pell
-0.66
POSITIVE LOGITS
veyard
0.82
suppress
0.64
supp
0.63
concent
0.62
activation
0.62
hypert
0.62
orbits
0.61
Mountains
0.61
toggle
0.60
suppressed
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.