INDEX
Explanations
information related to instances of violence, crime, and legal matters
New Auto-Interp
Negative Logits
pires
-0.82
itates
-0.55
ceases
-0.55
sleeps
-0.54
relies
-0.52
itiz
-0.51
likes
-0.51
guiIcon
-0.50
bara
-0.49
grows
-0.49
POSITIVE LOGITS
respectively
1.81
apiece
1.40
respective
0.96
themselves
0.93
together
0.88
collectively
0.79
*.
0.76
jointly
0.74
whereas
0.72
.
0.71
Activations Density 0.557%