INDEX
Explanations
references to relationships or connections among people
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.10
3:0.07
4:0.28
5:0.04
6:0.03
7:0.21
8:0.03
9:0.06
10:0.05
11:0.06
Negative Logits
iners
-1.47
obin
-1.44
oslav
-1.42
quit
-1.40
aughs
-1.39
iversal
-1.38
opped
-1.35
andowski
-1.33
bum
-1.32
inters
-1.32
POSITIVE LOGITS
workings
1.95
whereabouts
1.71
cloaked
1.58
nuances
1.55
worm
1.53
specifics
1.53
origins
1.52
particulars
1.48
birthplace
1.44
shrouded
1.42
Activations Density 0.080%