INDEX
Explanations
reflexive pronouns and phrases
phrases indicating personal responsibility or self-reflection
New Auto-Interp
Negative Logits
backdrop
-0.60
awaru
-0.58
milo
-0.58
PLA
-0.57
Footnote
-0.55
Loader
-0.55
utenberg
-0.55
TN
-0.53
Attribution
-0.53
pool
-0.53
POSITIVE LOGITS
dearly
0.76
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
0.75
wrong
0.68
praises
0.68
fools
0.68
ishers
0.67
hest
0.67
nightmares
0.67
holes
0.65
teasp
0.65
Activations Density 0.257%