INDEX
Explanations
occurrences of the word "the"
New Auto-Interp
Head Attr Weights
0:0.03
1:0.11
2:0.08
3:0.05
4:0.02
5:0.06
6:0.08
7:0.09
8:0.15
9:0.08
10:0.12
11:0.07
Negative Logits
srf
-1.09
ÃÂ
-0.95
tradem
-0.90
colonies
-0.86
pione
-0.85
births
-0.84
majors
-0.84
theor
-0.84
uberty
-0.83
aeper
-0.82
POSITIVE LOGITS
"@
1.20
Offline
1.11
"#
1.08
ESA
1.06
ory
1.01
"))
0.99
orage
0.99
Nik
0.97
seless
0.96
Snow
0.95
Activations Density 0.045%