INDEX
Explanations
assertions about possibilities and hypothetical situations
"It would be" or similar conditional phrases
evaluative judgment
New Auto-Interp
Negative Logits
出版年
-0.64
uxxxx
-0.64
poffe
-0.63
uſe
-0.63
chofe
-0.63
uſed
-0.60
fubject
-0.59
Diſ
-0.59
themſelves
-0.59
Majefty
-0.59
POSITIVE LOGITS
unfair
1.11
wrong
1.09
foolish
1.07
wrong
0.93
unjust
0.92
unreasonable
0.92
unwise
0.91
silly
0.90
folly
0.89
fair
0.86
Activations Density 0.358%