INDEX
Explanations
listing inclusions and exceptions
New Auto-Interp
Negative Logits
too
0.74
disadvantaged
0.73
depress
0.73
Wolves
0.72
marginalized
0.72
poop
0.70
pre
0.70
ไง
0.70
Bulldogs
0.69
disenfranch
0.69
POSITIVE LOGITS
including
1.03
Including
1.00
incluindo
0.91
except
0.85
including
0.83
.,
0.82
And
0.81
inol
0.81
直接
0.81
وذلك
0.80
Activations Density 0.012%