INDEX
Explanations
phrases related to physical actions or instructions
punctuations, particularly commas and their context in sentences
New Auto-Interp
Negative Logits
etheless
-0.77
ozy
-0.69
inctions
-0.67
zb
-0.66
rastructure
-0.66
payer
-0.65
ãĥīãĥ©
-0.65
rak
-0.64
unct
-0.63
iz
-0.62
POSITIVE LOGITS
hoping
1.07
but
1.05
joking
1.05
lest
1.02
reminding
1.00
implying
1.00
prompting
0.99
saying
0.94
insisting
0.93
pretending
0.93
Activations Density 0.350%