INDEX
Explanations
phrases related to personal characteristics and attributes, particularly in the context of identity and belonging
pronouns related to identity and group association
New Auto-Interp
Negative Logits
forcing
-0.90
forcement
-0.86
freeing
-0.83
concluding
-0.76
preventing
-0.76
temptation
-0.73
tightening
-0.72
retrieving
-0.71
limits
-0.71
forcing
-0.70
POSITIVE LOGITS
belong
1.47
belonged
1.39
lived
1.32
reside
1.25
graduated
1.22
resided
1.18
specialize
1.14
hail
1.13
grew
1.12
married
1.11
Activations Density 0.413%