INDEX
Explanations
mentions of interpersonal interactions and relationships
themes of interpersonal relationships and social interactions
New Auto-Interp
Negative Logits
umes
-0.73
omission
-0.65
Prediction
-0.61
cens
-0.60
Detected
-0.60
Seller
-0.59
Baird
-0.59
Shap
-0.59
suppression
-0.59
anasia
-0.59
POSITIVE LOGITS
exchanged
1.27
mutually
1.23
parted
1.21
bonded
1.14
reconcil
1.11
both
1.09
banter
1.07
discuss
1.04
spar
1.03
both
1.02
Activations Density 0.297%