INDEX
Explanations
occurrences of the word "the" in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.04
2:0.10
3:0.12
4:0.01
5:0.02
6:0.11
7:0.07
8:0.11
9:0.20
10:0.06
11:0.10
Negative Logits
Deliver
-1.05
ーテ
-1.02
habi
-1.01
Cho
-0.96
bour
-0.96
ゴン
-0.96
legate
-0.95
orate
-0.94
peer
-0.91
egu
-0.89
POSITIVE LOGITS
sake
2.47
purposes
1.98
reasons
1.65
ummies
1.37
ulz
1.32
icion
1.30
foreseeable
1.27
erity
1.22
izoph
1.15
iencies
1.15
Activations Density 0.078%