INDEX
Negative Logits
外
0.43
丁
0.42
ensos
0.40
donate
0.38
consents
0.37
Likely
0.36
கிள
0.36
कनेक्शन
0.35
штейн
0.35
Exactly
0.35
POSITIVE LOGITS
obl
0.52
onyx
0.44
stereotypical
0.43
猙
0.43
refuge
0.43
Bedürfn
0.43
rrbracket
0.42
journalistic
0.42
pornography
0.42
rectification
0.41
Activations Density 0.003%