INDEX
Explanations
phrases related to accountability and taking a stand against law enforcement actions beyond what is necessary
expressions of personal opinions and experiences
New Auto-Interp
Negative Logits
ML
-0.55
]),
-0.55
unexpectedly
-0.52
orsi
-0.51
®
-0.50
osponsors
-0.49
NAS
-0.49
awei
-0.49
¶
-0.48
beloved
-0.47
POSITIVE LOGITS
)."
0.65
gotta
0.61
gonna
0.61
remorse
0.60
.""
0.60
anooga
0.59
ause
0.57
fuckin
0.56
.'"
0.56
however
0.56
Activations Density 1.045%