INDEX
Explanations
details related to incidents and events mentioned in social media posts, particularly mentioning problems or negative experiences reported
references to social media platforms and their impact
New Auto-Interp
Head Attr Weights
0:0.07
1:0.05
2:0.04
3:0.08
4:0.30
5:0.03
6:0.03
7:0.04
8:0.05
9:0.13
10:0.07
11:0.05
Negative Logits
numbered
-1.66
ャ
-1.54
ukong
-1.43
nance
-1.42
nen
-1.32
wen
-1.31
APTER
-1.31
foreseen
-1.31
phased
-1.31
ヴ
-1.30
POSITIVE LOGITS
Redditor
1.68
76561
1.60
Attribution
1.52
memes
1.51
hasht
1.47
forums
1.42
1.42
speech
1.42
1.39
Tumblr
1.37
Activations Density 0.010%