INDEX
Explanations
details about historical events and characters
phrases related to the passage of time and change
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.05
3:0.21
4:0.04
5:0.07
6:0.02
7:0.08
8:0.05
9:0.10
10:0.18
11:0.09
Negative Logits
pick
-0.91
ンジ
-0.90
del
-0.88
cess
-0.87
dropping
-0.87
uty
-0.86
leasing
-0.84
Camp
-0.81
english
-0.81
Scholars
-0.81
POSITIVE LOGITS
hump
0.88
ieri
0.85
kward
0.83
──
0.83
Nunes
0.82
comprehension
0.82
orius
0.81
fundament
0.80
happened
0.80
applies
0.79
Activations Density 0.763%