INDEX
Explanations
phrases involving the word "by" and its related forms indicating action or attribution
New Auto-Interp
Head Attr Weights
0:0.12
1:0.06
2:0.01
3:0.07
4:0.03
5:0.17
6:0.09
7:0.01
8:0.27
9:0.09
10:0.01
11:0.02
Negative Logits
rawdownloadcloneembedreportprint
-2.75
ILCS
-2.52
lights
-2.40
66666666
-2.35
enfranch
-2.30
unique
-2.26
"}
-2.16
pleasures
-2.16
lux
-2.05
uala
-2.04
POSITIVE LOGITS
objections
2.73
leaked
2.54
proposals
2.49
misinformation
2.47
pleading
2.40
proposal
2.39
skeptics
2.36
warnings
2.35
environmentalists
2.32
skept
2.30
Activations Density 0.025%