INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
cial
-0.92
merce
-0.78
Interstitial
-0.75
interstitial
-0.73
swick
-0.73
cially
-0.72
nw
-0.72
LV
-0.70
nell
-0.69
Genius
-0.67
POSITIVE LOGITS
cutoff
0.72
veto
0.72
coerc
0.72
stakes
0.70
membership
0.69
prosec
0.68
rul
0.68
anamo
0.65
backing
0.65
steering
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.