INDEX
Explanations
expressions of gratitude or acknowledgment
New Auto-Interp
Head Attr Weights
0:0.08
1:0.03
2:0.14
3:0.11
4:0.13
5:0.08
6:0.06
7:0.03
8:0.09
9:0.13
10:0.06
11:0.03
Negative Logits
bryce
-1.37
��
-1.36
Bucks
-1.28
onto
-1.25
rider
-1.23
ouch
-1.19
Maver
-1.17
Shed
-1.13
rider
-1.12
wash
-1.11
POSITIVE LOGITS
NEWS
1.45
hin
1.39
spection
1.31
considerations
1.24
!--
1.24
ength
1.24
=================================
1.22
hindsight
1.22
imaginable
1.22
FTWARE
1.21
Activations Density 0.005%