INDEX
Negative Logits
inductively
0.94
polite
0.91
impersonal
0.84
choose
0.83
Groups
0.82
adsorb
0.79
deviate
0.79
corrected
0.78
subgroups
0.78
ANEOUS
0.78
POSITIVE LOGITS
플레이
0.79
9
0.76
8
0.76
1
0.72
play
0.71
Play
0.71
7
0.70
游客
0.70
3
0.70
2
0.69
Activations Density 0.000%