INDEX
Explanations
words and phrases associated with questions, challenges, and controversial statements
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.05
3:0.06
4:0.09
5:0.02
6:0.48
7:0.06
8:0.03
9:0.03
10:0.04
11:0.04
Negative Logits
ALSE
-1.32
photo
-1.29
waivers
-1.29
lineup
-1.28
False
-1.22
January
-1.22
fen
-1.20
****************
-1.20
�士
-1.15
{"-1.12
POSITIVE LOGITS
inki
1.35
bottleneck
1.33
wield
1.31
cest
1.29
fficient
1.29
asso
1.27
osate
1.24
ockets
1.23
ishly
1.22
Telescope
1.21
Activations Density 0.043%