INDEX
Explanations
the word "We" to indicate collective pronouns or references to a group
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.06
3:0.12
4:0.15
5:0.03
6:0.13
7:0.20
8:0.04
9:0.06
10:0.04
11:0.09
Negative Logits
ratios
-1.50
downs
-1.46
zers
-1.40
edom
-1.36
olesc
-1.34
euth
-1.31
wolves
-1.28
apo
-1.26
hattan
-1.25
Tsukuyomi
-1.25
POSITIVE LOGITS
itely
1.46
Movie
1.37
Ancient
1.29
Record
1.28
aux
1.27
Old
1.24
audio
1.23
iverse
1.23
Text
1.22
Subscribe
1.22
Activations Density 0.001%