INDEX
Explanations
references to emotional investment and personal connection
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.05
3:0.06
4:0.10
5:0.03
6:0.07
7:0.37
8:0.03
9:0.03
10:0.10
11:0.06
Negative Logits
clock
-1.54
coding
-1.54
Clock
-1.49
species
-1.48
ItemThumbnailImage
-1.38
spelling
-1.34
_-
-1.33
navigating
-1.33
fingerprint
-1.33
CEPT
-1.32
POSITIVE LOGITS
gall
1.90
unden
1.87
poured
1.85
encour
1.82
urized
1.75
wash
1.73
trough
1.72
buckets
1.68
ongyang
1.61
baths
1.58
Activations Density 0.006%