INDEX
Explanations
requests and polite instructions in the text
New Auto-Interp
Head Attr Weights
0:0.07
1:0.04
2:0.10
3:0.04
4:0.04
5:0.08
6:0.21
7:0.05
8:0.07
9:0.04
10:0.15
11:0.07
Negative Logits
��
-1.17
��
-1.10
tnc
-1.08
Reloaded
-1.08
alyst
-1.08
Polk
-1.06
))))
-1.05
EStreamFrame
-0.98
iencies
-0.97
fund
-0.96
POSITIVE LOGITS
iquette
1.34
mbuds
1.21
swear
1.11
rimination
1.11
bombard
1.10
Handbook
1.08
:-
1.07
inconvenience
1.02
spoilers
1.02
hesitate
1.02
Activations Density 0.010%