INDEX
Explanations
references to time periods and historical context
New Auto-Interp
Head Attr Weights
0:0.01
1:0.04
2:0.10
3:0.19
4:0.01
5:0.02
6:0.09
7:0.09
8:0.07
9:0.10
10:0.08
11:0.11
Negative Logits
_-
-1.26
onite
-1.14
quished
-1.13
ys
-1.12
unlike
-1.07
wr
-1.02
seless
-1.01
adj
-1.01
isen
-1.00
kept
-0.98
POSITIVE LOGITS
asons
1.14
trave
1.13
destro
1.10
clearance
1.08
successfully
1.02
fundament
0.99
idate
0.99
redes
0.97
urances
0.95
amate
0.94
Activations Density 0.009%