INDEX
Negative Logits
disple
0.43
disob
0.42
disapproval
0.42
undesired
0.42
unpopular
0.41
disapproved
0.41
ম্যাজিস্ট্রেট
0.41
disappro
0.40
যদি
0.39
unfavourable
0.39
POSITIVE LOGITS
Than
0.41
űr
0.38
Femme
0.38
Việt
0.37
femme
0.37
ık
0.36
ă
0.36
blijven
0.36
That
0.35
Carolina
0.35
Activations Density 0.003%