INDEX
Explanations
phrases indicating personal relationships
references to friendships or relationships with individuals
New Auto-Interp
Negative Logits
ItemImage
-0.65
essen
-0.60
aceous
-0.60
NESS
-0.59
assumption
-0.58
evaluations
-0.58
FANT
-0.58
inability
-0.58
Presence
-0.58
];
-0.57
POSITIVE LOGITS
hers
1.15
ours
1.12
yours
1.02
sorts
1.00
theirs
0.98
mine
0.91
ammad
0.72
course
0.71
irlf
0.69
hire
0.66
Activations Density 0.102%