INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
cedes
-0.74
pite
-0.73
osity
-0.70
afort
-0.67
anium
-0.67
arse
-0.66
avering
-0.66
osponsors
-0.64
jad
-0.62
negie
-0.62
POSITIVE LOGITS
rosis
0.70
Ò
0.67
naissance
0.65
Sherman
0.65
imental
0.63
Abrams
0.61
GREEN
0.60
sov
0.60
hipp
0.59
EST
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.