INDEX
Explanations
references to family relationships and significant others
family relationships and personal anecdotes
New Auto-Interp
Negative Logits
vernment
-0.76
pire
-0.66
arnaev
-0.66
themselves
-0.64
unin
-0.63
ensus
-0.62
utterstock
-0.61
ettlement
-0.61
ensions
-0.61
guiName
-0.60
POSITIVE LOGITS
loves
0.95
my
0.89
hates
0.88
todd
0.81
didnt
0.79
my
0.79
texted
0.77
likes
0.77
bought
0.76
prefers
0.75
Activations Density 0.150%