INDEX
Explanations
sexual attributes and conformity
New Auto-Interp
Negative Logits
truly
0.66
genuine
0.63
genuinely
0.62
Truly
0.55
véritable
0.50
echte
0.49
真实
0.48
真正的
0.48
really
0.47
உண்மைய
0.47
POSITIVE LOGITS
якобы
0.80
conform
0.71
möglichst
0.68
supposedly
0.68
conform
0.66
superficially
0.65
Conform
0.62
conforms
0.61
conformity
0.60
marketable
0.59
Activations Density 0.085%