INDEX
Explanations
terms related to personal relationships and significant connections
New Auto-Interp
Head Attr Weights
0:0.17
1:0.02
2:0.09
3:0.11
4:0.11
5:0.03
6:0.10
7:0.04
8:0.13
9:0.04
10:0.06
11:0.04
Negative Logits
exclusive
-1.92
Oops
-1.75
cautioned
-1.71
cautiously
-1.70
.)
-1.64
signaled
-1.58
undersc
-1.55
manually
-1.51
ensional
-1.50
termed
-1.46
POSITIVE LOGITS
Favorite
2.02
etc
1.97
brance
1.66
isms
1.65
thood
1.65
ivities
1.64
isations
1.61
hyde
1.56
comings
1.56
ories
1.55
Activations Density 0.002%