INDEX
Explanations
phrases indicating requests for more information
New Auto-Interp
Head Attr Weights
0:0.11
1:0.25
2:0.04
3:0.05
4:0.03
5:0.11
6:0.04
7:0.03
8:0.08
9:0.07
10:0.07
11:0.07
Negative Logits
�
-1.73
ensional
-1.54
del
-1.53
Lam
-1.52
Ult
-1.50
-+-+
-1.48
latch
-1.47
ouble
-1.47
Bung
-1.44
orc
-1.44
POSITIVE LOGITS
taboola
1.72
SPONSORED
1.59
reciation
1.55
udes
1.54
photo
1.54
licts
1.52
yright
1.51
=#
1.48
Citation
1.46
ItemTracker
1.45
Activations Density 0.001%