INDEX
Explanations
references to individuals being highlighted or singled out in various contexts
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.07
3:0.06
4:0.07
5:0.03
6:0.03
7:0.42
8:0.03
9:0.04
10:0.06
11:0.08
Negative Logits
loop
-1.59
gif
-1.44
hyd
-1.40
bonds
-1.37
decay
-1.36
oppable
-1.36
ramids
-1.33
iland
-1.30
conserv
-1.28
rio
-1.24
POSITIVE LOGITS
singled
1.78
grave
1.77
agher
1.59
zinski
1.48
��
1.46
scorn
1.45
unfairly
1.41
bies
1.40
collaborators
1.39
exceptions
1.38
Activations Density 0.002%