INDEX
Explanations
mentions of characteristics or identifiers related to individuals like nationality, gender, age, and parental status
references to identity and demographic factors
New Auto-Interp
Negative Logits
forcement
-0.87
Reviewer
-0.82
deterrence
-0.78
forcing
-0.77
GGGGGGGG
-0.73
concluding
-0.71
suspending
-0.70
preventing
-0.69
temptation
-0.68
freeing
-0.67
POSITIVE LOGITS
reside
1.43
belong
1.42
hail
1.40
originate
1.20
belonged
1.18
live
1.17
lived
1.14
specialize
1.12
resided
1.08
inhabit
1.06
Activations Density 0.399%