INDEX
Explanations
attends to tokens that denote social media or platform links from tokens that are part of usernames or channel names
New Auto-Interp
Head Attr Weights
0:0.07
1:0.08
2:0.15
3:0.13
4:0.06
5:0.03
6:0.22
7:0.22
Negative Logits
autorytatywna
-0.41
kasarigan
-0.34
defaultstate
-0.33
ویکیپدی
-0.33
mergeFrom
-0.31
:✨
-0.30
}');
-0.30
SerializedSize
-0.30
jspb
-0.29
MigrationBuilder
-0.29
POSITIVE LOGITS
Slee
0.29
ISTAT
0.28
catalyzed
0.27
Turch
0.27
bech
0.26
Luce
0.26
chao
0.25
emancipation
0.25
Eman
0.25
ilat
0.25
Activations Density 0.039%