INDEX
Explanations
phrases related to analyzing and discussing behavior or actions of individuals
descriptive phrases and behavioral assessments of individuals
New Auto-Interp
Negative Logits
concludes
-0.72
orthy
-0.63
EDITION
-0.62
Investigators
-0.61
underscores
-0.61
licens
-0.60
Shap
-0.59
Gutenberg
-0.59
illustration
-0.58
listings
-0.58
POSITIVE LOGITS
sul
0.87
pissed
0.87
annoyed
0.82
fucked
0.81
messed
0.80
noticeably
0.80
goof
0.79
stubborn
0.78
behaved
0.77
kinda
0.77
Activations Density 0.903%