INDEX
Explanations
mentions of historical or real-world events and settings
references to reading and related activities
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.21
3:0.07
4:0.09
5:0.05
6:0.02
7:0.01
8:0.22
9:0.14
10:0.04
11:0.02
Negative Logits
akable
-1.33
porous
-1.23
xious
-1.04
ofi
-1.03
confounding
-1.02
easiest
-1.02
onomous
-1.01
ounced
-1.00
negotiators
-0.98
atu
-0.97
POSITIVE LOGITS
DragonMagazine
1.65
aloud
1.60
VIDEOS
1.54
版
1.54
dayName
1.35
Pastebin
1.29
oslav
1.26
Digest
1.25
Dictionary
1.24
papers
1.24
Activations Density 0.019%