INDEX
Explanations
personal interactions and relationships between individuals
New Auto-Interp
Negative Logits
Cancel
-0.77
UPDATE
-0.76
Update
-0.73
pending
-0.70
confirming
-0.67
osponsors
-0.67
UPDATE
-0.63
reversal
-0.62
reversing
-0.62
Ending
-0.61
POSITIVE LOGITS
tended
1.54
wore
1.38
liked
1.36
depended
1.34
loved
1.33
knew
1.30
grew
1.29
cared
1.28
hated
1.26
lived
1.26
Activations Density 0.396%