INDEX
Explanations
phrases indicating the presence of specific items or components
New Auto-Interp
Head Attr Weights
0:0.05
1:0.03
2:0.20
3:0.04
4:0.22
5:0.06
6:0.02
7:0.03
8:0.05
9:0.18
10:0.04
11:0.02
Negative Logits
uthor
-1.58
illon
-1.43
usk
-1.34
ndra
-1.31
illac
-1.28
uesday
-1.23
estones
-1.22
emanc
-1.21
vic
-1.20
alli
-1.20
POSITIVE LOGITS
Blumenthal
1.46
Gork
1.34
Morty
1.29
Nord
1.25
Kell
1.23
Kers
1.20
Milo
1.16
Klu
1.15
Katrina
1.15
riers
1.14
Activations Density 0.004%