INDEX
Explanations
first-person pronouns and expressions of personal experience or emotion
New Auto-Interp
Negative Logits
realise
-0.17
oplan
-0.17
realization
-0.16
realizing
-0.16
Think
-0.15
ä¸įçŁ¥
-0.15
realizes
-0.15
óż
-0.15
realised
-0.15
think
-0.15
POSITIVE LOGITS
indeed
0.17
should
0.16
dod
0.16
misc
0.16
made
0.15
finally
0.15
belong
0.15
potentially
0.15
Dod
0.15
possibly
0.15
Activations Density 0.195%