INDEX
Explanations
references to specific individuals or entities and their associated attributes or actions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.05
3:0.07
4:0.24
5:0.02
6:0.05
7:0.33
8:0.03
9:0.02
10:0.05
11:0.06
Negative Logits
lance
-1.45
quartered
-1.37
wark
-1.36
olk
-1.30
gered
-1.27
rouse
-1.25
860
-1.21
YORK
-1.19
eln
-1.19
usk
-1.17
POSITIVE LOGITS
names
1.87
positives
1.85
prominently
1.75
redacted
1.68
boxes
1.62
markers
1.61
lists
1.59
items
1.58
similarities
1.54
negatives
1.51
Activations Density 0.007%