INDEX
Explanations
terms that express certainty or emphasis in a statement
phrases and terms related to discrimination or prejudice against particular groups
New Auto-Interp
Head Attr Weights
0:0.06
1:0.03
2:0.18
3:0.07
4:0.06
5:0.10
6:0.03
7:0.02
8:0.10
9:0.24
10:0.04
11:0.03
Negative Logits
amins
-1.50
Jr
-1.32
Mich
-1.30
ゼウス
-1.29
inar
-1.25
Fra
-1.25
usercontent
-1.24
Bett
-1.23
Calif
-1.22
resil
-1.21
POSITIVE LOGITS
clauses
1.34
binding
1.30
revised
1.29
memor
1.27
borrowing
1.22
derivatives
1.21
iths
1.16
transfer
1.15
ticking
1.14
merged
1.13
Activations Density 0.006%