INDEX
Explanations
names of authors, directors, and illustrators
names of authors and contributors in literature or media
New Auto-Interp
Negative Logits
convictions
-0.71
silence
-0.67
ccording
-0.64
judgments
-0.64
filibuster
-0.64
arrests
-0.64
20439
-0.63
ghazi
-0.63
object
-0.62
refusal
-0.61
POSITIVE LOGITS
Architects
1.26
Productions
0.99
iets
0.93
Associates
0.92
Random
0.80
Studios
0.78
(#
0.77
QC
0.76
Graphics
0.76
orks
0.75
Activations Density 0.349%