INDEX
Explanations
references to clarity and conciseness in writing
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.17
3:0.14
4:0.14
5:0.02
6:0.04
7:0.15
8:0.04
9:0.05
10:0.08
11:0.09
Negative Logits
"},"
-1.54
opter
-1.49
UGE
-1.43
nings
-1.41
apsed
-1.41
ieved
-1.40
vier
-1.39
�
-1.38
ゼウス
-1.38
TextColor
-1.36
POSITIVE LOGITS
Powers
1.51
Maxim
1.47
Awareness
1.35
Hunt
1.34
Omn
1.33
appell
1.33
Mush
1.32
Wiki
1.31
simplicity
1.29
Debate
1.29
Activations Density 0.007%