INDEX
Explanations
phrases or terms associated with acceptability or standards
New Auto-Interp
Negative Logits
Autoritní
-0.40
tours
-0.36
read
-0.35
tour
-0.35
crow
-0.35
Ferri
-0.35
writing
-0.35
mortar
-0.34
in
-0.34
trus
-0.33
POSITIVE LOGITS
acceptable
2.03
Acceptable
1.89
Acceptable
1.86
acceptable
1.85
unacceptable
1.46
ceptable
1.36
acceptability
1.32
accep
1.14
tolerable
1.09
accep
1.09
Activations Density 0.009%