INDEX
Explanations
phrases related to political speeches or statements
repeated references to collective pronouns, particularly "we."
New Auto-Interp
Negative Logits
tains
-0.74
uces
-0.71
advertisement
-0.64
lights
-0.63
ulence
-0.60
VERTISEMENT
-0.59
laughs
-0.59
mund
-0.59
Leopard
-0.57
imal
-0.57
POSITIVE LOGITS
akening
1.26
owe
1.23
're
1.22
need
1.19
cannot
1.12
've
1.11
must
1.10
ought
1.05
'll
1.00
deserve
1.00
Activations Density 0.162%