INDEX
Negative Logits
alienated
0.54
每次
0.46
immers
0.45
appreciative
0.44
oblivious
0.44
ing
0.43
pathetic
0.43
人生
0.43
hos
0.43
hopeless
0.43
POSITIVE LOGITS
టా
0.52
𐰺
0.52
қу
0.50
ट्
0.50
Nahrung
0.49
Köln
0.48
ᖕ
0.47
Detection
0.47
Beim
0.46
бонус
0.45
Activations Density 0.000%