INDEX
Explanations
repeated instances of the letter "s" in words
New Auto-Interp
Head Attr Weights
0:0.07
1:0.37
2:0.05
3:0.04
4:0.04
5:0.15
6:0.03
7:0.02
8:0.06
9:0.04
10:0.04
11:0.02
Negative Logits
whelming
-1.95
stood
-1.92
amer
-1.80
Bret
-1.77
��
-1.75
sold
-1.72
united
-1.71
AMER
-1.64
Victoria
-1.61
Californ
-1.57
POSITIVE LOGITS
ip
3.11
IP
2.52
ips
2.23
isco
2.20
ipt
2.14
Rip
2.08
iped
2.05
IP
2.00
iffe
1.94
ipop
1.94
Activations Density 0.002%