INDEX
Explanations
phrases that involve direct speech, particularly those that convey important statements or sentiments
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.24
3:0.06
4:0.10
5:0.02
6:0.03
7:0.16
8:0.06
9:0.03
10:0.07
11:0.12
Negative Logits
etter
-1.55
AJ
-1.35
resear
-1.34
Hak
-1.32
ransom
-1.31
editor
-1.29
keynote
-1.27
iga
-1.26
Benedict
-1.24
christ
-1.24
POSITIVE LOGITS
interstitial
2.19
NetMessage
1.82
��
1.59
////////////////
1.56
ntil
1.52
Temperature
1.47
shed
1.43
leness
1.40
lest
1.40
WHERE
1.39
Activations Density 0.008%