INDEX
Explanations
comparisons or interactions between individuals in a social setting
descriptions of personal relationships and social interactions
New Auto-Interp
Negative Logits
etheless
-0.71
moil
-0.70
ornia
-0.65
sequent
-0.65
allel
-0.63
alm
-0.62
Results
-0.62
iren
-0.61
arthed
-0.61
gran
-0.61
POSITIVE LOGITS
himself
0.85
remorse
0.79
fuckin
0.78
pacing
0.73
me
0.72
terrific
0.72
asleep
0.70
abras
0.68
tyr
0.68
unbelievable
0.68
Activations Density 0.628%