INDEX
Negative Logits
Qualitative
0.71
学生
0.66
PhD
0.66
Student
0.64
IOP
0.63
Graduate
0.61
PhD
0.61
學生
0.60
STUDENT
0.59
Ph
0.59
POSITIVE LOGITS
doppia
0.64
vermelho
0.63
izz
0.63
scandal
0.62
scandals
0.61
cone
0.59
совсем
0.59
worst
0.58
крас
0.57
goblin
0.57
Activations Density 0.002%