INDEX
Explanations
references to comparisons or analogies involving significant historical events or figures
New Auto-Interp
Head Attr Weights
0:0.01
1:0.03
2:0.09
3:0.08
4:0.11
5:0.03
6:0.02
7:0.37
8:0.03
9:0.03
10:0.05
11:0.11
Negative Logits
URE
-1.89
ivo
-1.62
alde
-1.59
itol
-1.56
duction
-1.48
otto
-1.45
esse
-1.44
URES
-1.42
Newsletter
-1.42
iott
-1.42
POSITIVE LOGITS
verty
1.78
Huss
1.60
hov
1.53
pros
1.52
Kard
1.41
Gur
1.40
intens
1.39
��
1.38
tatt
1.36
Caucas
1.34
Activations Density 0.009%