INDEX
Negative Logits
Ń
0.42
fony
0.41
ne
0.41
NE
0.41
šnje
0.40
uly
0.39
rimps
0.39
𝘂
0.38
PDE
0.38
UMIRE
0.37
POSITIVE LOGITS
tired
0.39
cracked
0.38
tiredness
0.38
consenting
0.37
Cracked
0.37
Raff
0.37
raf
0.37
alternate
0.36
adventure
0.35
Mous
0.35
Activations Density 0.010%