INDEX
Explanations
phrases related to violence and social injustices
New Auto-Interp
Negative Logits
osite
-0.64
CLASSIFIED
-0.61
ocry
-0.58
endum
-0.57
trak
-0.57
ongo
-0.56
Outlook
-0.56
ANC
-0.55
Reloaded
-0.55
ortium
-0.55
POSITIVE LOGITS
anymore
0.87
instead
0.84
whilst
0.81
blah
0.79
while
0.78
.;
0.78
lest
0.77
outweigh
0.77
ruining
0.75
.#
0.75
Activations Density 0.366%