INDEX
Explanations
phrases related to actions taken or desired by individuals or groups
expressions of intent or actions and the associated consequences
New Auto-Interp
Negative Logits
ggles
-0.68
pora
-0.68
externalActionCode
-0.65
Redditor
-0.64
por
-0.63
EngineDebug
-0.63
requent
-0.63
cedented
-0.62
emn
-0.61
è¦ļéĨĴ
-0.61
POSITIVE LOGITS
themselves
1.59
their
1.09
THEIR
1.01
their
0.85
us
0.82
me
0.81
Their
0.80
selves
0.80
Their
0.75
selves
0.73
Activations Density 0.675%