INDEX
Explanations
expressions of self-reflection or personal notes
New Auto-Interp
Head Attr Weights
0:0.07
1:0.03
2:0.02
3:0.06
4:0.12
5:0.20
6:0.04
7:0.03
8:0.21
9:0.12
10:0.02
11:0.03
Negative Logits
:\
-2.61
acca
-2.41
SourceFile
-2.36
MpServer
-2.21
orest
-2.15
efer
-2.09
"></
-2.00
ridor
-2.00
��
-2.00
ibrary
-1.99
POSITIVE LOGITS
tentative
2.31
revisions
2.27
phases
2.19
pitches
2.06
intentions
2.06
discontin
2.04
Seek
2.03
experimental
1.99
hypotheses
1.97
FINAL
1.95
Activations Density 0.000%