INDEX
Explanations
left behind, harmed, excluded
New Auto-Interp
Negative Logits
𝚝
0.72
seseorang
0.72
bekannte
0.69
Surrounded
0.68
anean
0.68
dola
0.68
δήποτε
0.67
accompanied
0.67
ചെയ്യാ
0.66
hensible
0.66
POSITIVE LOGITS
left
1.69
left
1.38
Left
1.34
Left
1.30
harmed
1.26
cheated
1.22
excluded
1.21
wronged
1.21
devastated
1.20
helpless
1.18
Activations Density 0.067%