INDEX
Explanations
pronouns, particularly those associated with individuals, indicating their actions or states
New Auto-Interp
Negative Logits
Marriage
-0.69
Fashion
-0.63
duc
-0.63
Domestic
-0.63
Dominion
-0.62
apex
-0.60
Sussex
-0.58
ogue
-0.58
piring
-0.57
Gad
-0.57
POSITIVE LOGITS
'd
1.14
personally
1.02
encount
1.02
consulted
0.98
regretted
0.97
awoke
0.96
'll
0.93
hoped
0.89
've
0.85
heard
0.85
Activations Density 0.097%