INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
bm
-0.72
bors
-0.67
MC
-0.65
leton
-0.64
Interstitial
-0.64
lu
-0.64
Row
-0.63
ione
-0.62
TS
-0.62
Downing
-0.62
POSITIVE LOGITS
perspect
0.78
chimpanzees
0.77
Deng
0.72
Omaha
0.70
agon
0.69
hostage
0.68
helicop
0.67
urgently
0.67
weap
0.67
drones
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.