INDEX
Explanations
words that express inclusivity or commonly reference groups
New Auto-Interp
Negative Logits
ThroughAttribute
-0.76
LabelTagHelper
-0.76
TagMode
-0.70
propOrder
-0.70
réguli
-0.67
greateſt
-0.67
ystema
-0.64
Cedric
-0.64
noires
-0.61
Theſe
-0.61
POSITIVE LOGITS
đều
0.83
nhau
0.58
very
0.56
bajo
0.56
enumi
0.56
being
0.55
were
0.54
very
0.54
tocin
0.53
urti
0.52
Activations Density 0.199%