INDEX
Explanations
instances of words related to expressing support or approval
phrases indicating support for various actions or causes
New Auto-Interp
Negative Logits
ashtra
-0.71
mAh
-0.70
fing
-0.70
Tracker
-0.68
Vers
-0.66
uder
-0.66
mie
-0.66
hole
-0.61
orbit
-0.61
naire
-0.60
POSITIVE LOGITS
gotten
0.73
ints
0.70
embattled
0.69
supporting
0.66
enance
0.65
vested
0.65
icans
0.65
unsupported
0.65
whichever
0.63
marginalized
0.62
Activations Density 0.119%