INDEX
Explanations
phrases that indicate content from various sources, particularly those introducing or citing textual passages
New Auto-Interp
Head Attr Weights
0:0.02
1:0.06
2:0.13
3:0.03
4:0.02
5:0.09
6:0.08
7:0.07
8:0.08
9:0.22
10:0.07
11:0.08
Negative Logits
aukee
-1.29
swing
-1.27
recovering
-1.27
slump
-1.25
headaches
-1.25
coun
-1.20
settle
-1.12
reckoning
-1.12
babys
-1.12
productive
-1.12
POSITIVE LOGITS
initials
1.43
Dictionary
1.40
LU
1.36
Koran
1.36
insign
1.35
"@
1.35
packets
1.35
Emin
1.34
acronym
1.31
solete
1.28
Activations Density 0.020%