INDEX
Negative Logits
ương
0.51
骈
0.49
stimulation
0.47
ζ
0.46
дела
0.45
separator
0.45
jezika
0.44
scissor
0.44
Zh
0.44
俔
0.43
POSITIVE LOGITS
genuinely
0.78
honestly
0.63
echt
0.60
Honestly
0.57
真的
0.55
REALLY
0.54
virkelig
0.54
genu
0.54
sincerely
0.54
是真的
0.52
Activations Density 0.062%