INDEX
Explanations
proper nouns and names in the text
New Auto-Interp
Head Attr Weights
0:0.05
1:0.01
2:0.02
3:0.10
4:0.03
5:0.13
6:0.03
7:0.21
8:0.04
9:0.02
10:0.28
11:0.02
Negative Logits
exhilar
-2.00
years
-2.00
photograp
-1.97
exclaim
-1.97
triumph
-1.95
irresist
-1.93
danced
-1.92
comparing
-1.90
colours
-1.87
fun
-1.82
POSITIVE LOGITS
het
2.20
inar
2.19
chel
2.11
pton
2.08
rahim
2.07
essen
2.06
tails
2.02
stress
1.97
chell
1.96
oppy
1.94
Activations Density 0.003%