INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ITIES
-0.69
Cummings
-0.67
eries
-0.67
irds
-0.65
ibles
-0.65
atl
-0.63
andise
-0.63
apons
-0.63
Lamar
-0.63
Ital
-0.63
POSITIVE LOGITS
environmentally
0.71
degrading
0.71
superpower
0.70
insecure
0.69
electron
0.67
nanop
0.66
ideologically
0.66
dismant
0.66
dystopian
0.64
deter
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.