INDEX
Explanations
phrases related to importance or urgency
the word "is" and its variations to express existence or statements about being
New Auto-Interp
Negative Logits
lance
-0.84
yip
-0.69
sonian
-0.63
Blades
-0.63
alities
-0.60
Moines
-0.60
Locations
-0.58
uates
-0.56
ggles
-0.55
imil
-0.55
POSITIVE LOGITS
ever
1.35
EVER
1.22
spared
0.94
immune
0.94
remotely
0.90
necessarily
0.90
harmed
0.89
allowed
0.88
truly
0.88
allowed
0.83
Activations Density 0.091%