INDEX
Explanations
words related to things that are doubtful, questionable, or of suspect nature
terms associated with doubt and untrustworthiness
New Auto-Interp
Negative Logits
olon
-0.94
ften
-0.91
hner
-0.86
ynthesis
-0.84
eding
-0.79
ummer
-0.78
eded
-0.76
brate
-0.76
OVA
-0.75
hower
-0.75
POSITIVE LOGITS
questionable
1.03
dubious
0.95
legality
0.86
suspic
0.76
foul
0.69
indisc
0.69
commodities
0.68
amounts
0.68
astronomical
0.68
unethical
0.67
Activations Density 0.012%