INDEX
Explanations
quotes or direct speech used in context
phrases that express evaluations or opinions
New Auto-Interp
Head Attr Weights
0:0.12
1:0.03
2:0.07
3:0.13
4:0.06
5:0.08
6:0.05
7:0.02
8:0.16
9:0.14
10:0.07
11:0.02
Negative Logits
Accessory
-1.16
�醒
-1.16
jit
-1.14
Contact
-1.13
MFT
-1.13
Downloadha
-1.07
iaz
-1.05
pora
-1.04
etus
-1.01
Topics
-1.01
POSITIVE LOGITS
kered
1.21
!).
1.14
begg
1.05
adan
1.04
├
1.03
gmaxwell
1.01
:)
1.00
:-)
0.99
!.
0.99
slee
0.97
Activations Density 0.066%