INDEX
Explanations
phrases starting with "I" and verb phrases detailing personal actions and experiences
statements involving the speaker's thoughts or experiences
New Auto-Interp
Negative Logits
hyde
-0.71
staking
-0.68
ategory
-0.64
Poe
-0.63
mone
-0.62
agos
-0.62
assad
-0.61
senal
-0.60
eleph
-0.59
tradem
-0.59
POSITIVE LOGITS
Ĥª
0.82
ãĤ»
0.79
Learned
0.75
iverse
0.71
natureconservancy
0.71
happened
0.70
azes
0.69
nutshell
0.67
boils
0.67
asty
0.67
Activations Density 0.132%