INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
1
0.82
as
0.81
with
0.75
with
0.74
for
0.71
ar
0.69
as
0.66
is
0.65
ado
0.65
on
0.63
POSITIVE LOGITS
saddened
0.63
در
0.61
)$.
0.56
في
0.55
wretched
0.53
ו
0.53
も
0.53
ਰ
0.53
ד
0.52
)،
0.52
Activations Density 9.143%