INDEX
Explanations
references to related topics or sections in the text
New Auto-Interp
Head Attr Weights
0:0.04
1:0.05
2:0.06
3:0.15
4:0.13
5:0.04
6:0.07
7:0.10
8:0.04
9:0.06
10:0.10
11:0.12
Negative Logits
jri
-1.44
iour
-1.29
invariably
-1.29
usually
-1.29
collar
-1.27
carriage
-1.26
Morty
-1.25
laz
-1.25
龍喚士
-1.25
tread
-1.24
POSITIVE LOGITS
LIVE
1.56
Emails
1.48
VID
1.40
Closing
1.36
Wanted
1.34
Video
1.34
Critics
1.31
Legal
1.30
moot
1.28
Concern
1.27
Activations Density 0.015%