INDEX
Explanations
pronouns 'his' or 'her' followed by possessive or descriptive words
references to individuals and their associated attributes or roles
New Auto-Interp
Negative Logits
ÑĮ
-0.84
Reviewed
-0.78
agues
-0.77
isode
-0.74
vier
-0.73
oke
-0.73
cells
-0.73
akings
-0.72
itutes
-0.72
okes
-0.71
POSITIVE LOGITS
penchant
1.65
inability
1.53
propensity
1.49
tendency
1.45
willingness
1.44
insistence
1.42
unwillingness
1.38
refusal
1.38
reluctance
1.27
obsession
1.27
Activations Density 0.242%