INDEX
Explanations
phrases or words indicating disapproval or lack of acceptance
statements expressing condemnation or disapproval of actions deemed inappropriate or immoral
New Auto-Interp
Negative Logits
ynthesis
-0.88
zyme
-0.85
tein
-0.84
pel
-0.79
oult
-0.77
ingers
-0.76
uay
-0.76
ools
-0.76
hung
-0.75
ulet
-0.74
POSITIVE LOGITS
compromises
0.84
adolesc
0.82
unacceptable
0.76
behaviour
0.73
occurrences
0.72
srfAttach
0.71
undermin
0.70
ambassadors
0.69
enrichment
0.68
LY
0.68
Activations Density 0.015%