INDEX
Explanations
references to follow-up actions or discussions
New Auto-Interp
Negative Logits
.ToolTip
-0.16
اض
-0.15
ninger
-0.15
uts
-0.14
iete
-0.14
Äįky
-0.14
ieten
-0.14
ubi
-0.14
uli
-0.13
à¸Ļำ
-0.13
POSITIVE LOGITS
follow
0.34
Follow
0.31
Follow
0.29
.follow
0.28
follow
0.27
-follow
0.25
_follow
0.21
follows
0.20
-up
0.19
FOLLOW
0.19
Activations Density 0.007%