INDEX
Explanations
modals indicating possibility or necessity
negations and conditional phrases related to hypothetical situations
New Auto-Interp
Negative Logits
isively
-0.81
pick
-0.66
wards
-0.64
perty
-0.62
cius
-0.61
izens
-0.61
izen
-0.61
luaj
-0.59
orously
-0.58
ngth
-0.58
POSITIVE LOGITS
soType
0.71
OTAL
0.68
rue
0.66
GOODMAN
0.64
SPONSORED
0.64
GD
0.64
happening
0.63
Bungie
0.63
ãĥķãĤ©
0.63
happ
0.63
Activations Density 0.169%