INDEX
Explanations
questions and phrases requesting explanations or clarifications
New Auto-Interp
Head Attr Weights
0:0.02
1:0.00
2:0.09
3:0.32
4:0.12
5:0.02
6:0.04
7:0.11
8:0.04
9:0.04
10:0.06
11:0.07
Negative Logits
sshd
-1.52
illac
-1.51
paio
-1.46
perors
-1.44
stuffing
-1.42
arning
-1.41
assic
-1.40
igi
-1.39
dism
-1.38
��
-1.37
POSITIVE LOGITS
?)
2.78
?:
2.67
?]
2.45
?????
2.40
??
2.40
?
2.26
?).
2.22
??
2.21
???
2.14
?
2.11
Activations Density 0.035%