INDEX
Explanations
also followed by a verb
New Auto-Interp
Negative Logits
’
0.73
anteced
0.71
𝟯
0.70
ע
0.70
し
0.66
নে
0.66
jší
0.66
loafers
0.66
-
0.65
ਮ
0.65
POSITIVE LOGITS
to
1.02
t
0.90
то
0.79
<0x80>
0.76
с
0.74
да
0.73
></
0.73
(
0.72
на
0.72
h
0.69
Activations Density 1.267%