INDEX
Explanations
mentions of pronouns related to individuals and their actions or states
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.64
Vanity
-0.64
Innocent
-0.62
Observer
-0.61
Yesterday
-0.60
NK
-0.60
Kirin
-0.59
NP
-0.59
Oriental
-0.59
Nin
-0.58
POSITIVE LOGITS
'll
1.11
'd
1.08
proceeded
1.06
reverted
1.02
withdrew
1.00
recons
1.00
resumed
0.99
realized
0.98
retreated
0.97
encount
0.96
Activations Density 0.144%