INDEX
Negative Logits
゛
0.70
anger
0.70
Sart
0.67
ANGER
0.65
rometer
0.64
launcher
0.63
দাতা
0.63
appointments
0.63
व्यास
0.62
comp
0.62
POSITIVE LOGITS
ㄖ
0.76
чен
0.76
Chinese
0.74
NAL
0.74
笹
0.74
inmate
0.73
Rohingya
0.73
NELL
0.71
TikTok
0.71
тельному
0.69
Activations Density 0.001%