INDEX
Negative Logits
糖尿
0.39
होकर
0.38
⃖
0.37
डायरेक्शन
0.37
hayan
0.36
दुर
0.36
とのこと
0.36
ným
0.35
하여
0.35
Rek
0.35
POSITIVE LOGITS
feeds
0.46
cibo
0.45
foods
0.44
foods
0.41
limites
0.40
tı
0.40
improves
0.39
embedded
0.38
results
0.38
punishments
0.37
Activations Density 0.001%