INDEX
Explanations
the presence of the pronoun "I"
New Auto-Interp
Head Attr Weights
0:0.10
1:0.07
2:0.08
3:0.07
4:0.09
5:0.07
6:0.06
7:0.08
8:0.08
9:0.08
10:0.09
11:0.08
Negative Logits
Saud
-2.90
�
-2.68
女
-2.64
dysph
-2.60
Sins
-2.60
【
-2.55
¯
-2.52
Roses
-2.51
ebted
-2.50
Sever
-2.49
POSITIVE LOGITS
broadcasters
2.69
astronomy
2.64
optics
2.57
Accuracy
2.52
Cosmos
2.52
antennas
2.51
ereo
2.40
endo
2.40
broadcaster
2.38
amplified
2.35
Activations Density 0.000%