INDEX
Explanations
commas and other punctuation indicating separate clauses or items in a sentence
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.14
3:0.06
4:0.07
5:0.03
6:0.07
7:0.17
8:0.07
9:0.02
10:0.12
11:0.17
Negative Logits
ombat
-1.68
��
-1.60
Gleaming
-1.58
urbed
-1.55
culosis
-1.52
undown
-1.50
itionally
-1.50
ibly
-1.48
oak
-1.47
anwhile
-1.45
POSITIVE LOGITS
Winc
1.42
Cumm
1.39
collaborator
1.35
mates
1.31
rivals
1.30
publishers
1.25
RAW
1.24
sibling
1.24
param
1.23
Grind
1.22
Activations Density 0.057%