INDEX
Negative Logits
ो
0.56
’,
0.54
ă
0.54
menjelaskan
0.54
า
0.53
ะ
0.53
'
0.52
have
0.49
Have
0.48
aturan
0.48
POSITIVE LOGITS
encouragement
1.20
Encour
1.14
encour
1.06
encour
1.05
discouraged
1.03
鼓励
1.02
encouraging
0.98
Encourage
0.97
encourage
0.94
鼓勵
0.91
Activations Density 0.037%