INDEX
Explanations
key ideas and significant topics in discussions
New Auto-Interp
Head Attr Weights
0:0.03
1:0.05
2:0.12
3:0.05
4:0.02
5:0.05
6:0.06
7:0.08
8:0.22
9:0.06
10:0.07
11:0.13
Negative Logits
atever
-1.24
assed
-1.18
sucked
-1.18
wered
-1.16
osite
-1.16
ustomed
-1.15
ready
-1.14
raped
-1.12
peeled
-1.12
ripe
-1.12
POSITIVE LOGITS
promotions
1.23
crowdfunding
1.17
profiling
1.14
advocacy
1.14
Minority
1.14
sbm
1.11
FAQ
1.11
senal
1.10
Rush
1.07
panic
1.05
Activations Density 0.020%