INDEX
Explanations
phrases related to weighing advantages and disadvantages
references to the advantages of a situation or concept, often described as "pros."
New Auto-Interp
Negative Logits
ading
-0.70
DER
-0.69
ATA
-0.68
MORE
-0.68
aded
-0.66
ashes
-0.66
OWS
-0.63
owship
-0.63
orf
-0.62
raped
-0.62
POSITIVE LOGITS
pros
1.25
ocial
1.10
Pros
1.00
outwe
0.91
aic
0.90
yip
0.87
Pros
0.86
pse
0.86
cephal
0.84
daq
0.83
Activations Density 0.009%