INDEX
Negative Logits
cleanly
0.69
normally
0.66
intermedia
0.66
sanity
0.66
тип
0.65
intera
0.63
cleanup
0.61
paper
0.61
પહો
0.60
Idam
0.60
POSITIVE LOGITS
irresponsible
0.67
advocating
0.66
encourage
0.65
unethical
0.65
aventuras
0.64
advocated
0.64
Advocacy
0.63
probiotics
0.63
incroyable
0.63
suicidal
0.62
Activations Density 0.000%