INDEX
Explanations
words and phrases indicating temporal references and historical context
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.07
3:0.04
4:0.02
5:0.06
6:0.10
7:0.13
8:0.07
9:0.05
10:0.06
11:0.31
Negative Logits
upper
-1.34
tomorrow
-1.18
Miracle
-1.16
ief
-1.14
Mermaid
-1.13
verse
-1.13
Doors
-1.12
exit
-1.10
doorstep
-1.10
lite
-1.09
POSITIVE LOGITS
acknow
1.36
confir
1.36
reviewers
1.20
examples
1.20
deployments
1.15
disclaim
1.14
atell
1.13
merce
1.12
arlane
1.10
iths
1.09
Activations Density 0.023%