INDEX
Explanations
references to the Patriots team and expressions of patriotism
New Auto-Interp
Negative Logits
vier
-0.20
Arena
-0.15
IDEO
-0.15
eren
-0.14
itoris
-0.14
.bits
-0.14
Gratis
-0.14
throat
-0.14
ngo
-0.14
룡
-0.14
POSITIVE LOGITS
zsche
0.17
zÄĻ
0.16
upp
0.15
652
0.15
oufl
0.14
Prepared
0.14
utsche
0.14
onnement
0.14
upt
0.14
zed
0.14
Activations Density 0.005%