INDEX
Explanations
interactions on social media platforms
New Auto-Interp
Head Attr Weights
0:0.13
1:0.05
2:0.06
3:0.08
4:0.06
5:0.09
6:0.09
7:0.05
8:0.08
9:0.11
10:0.10
11:0.04
Negative Logits
pall
-1.03
shred
-0.98
understandable
-0.98
vit
-0.97
into
-0.97
gd
-0.95
foremost
-0.89
lon
-0.88
trumpet
-0.88
Loren
-0.88
POSITIVE LOGITS
yip
1.57
Interstitial
1.38
ebin
1.26
ombies
1.26
sidx
1.16
LESS
1.16
bookmark
1.13
Pastebin
1.13
cycles
1.12
Cancel
1.05
Activations Density 0.005%