INDEX
Explanations
occurrences of personal pronouns and phrases indicative of human interaction
New Auto-Interp
Negative Logits
nepÅĻÃŃ
-0.16
máºŃt
-0.15
essel
-0.14
deprivation
-0.14
depr
-0.14
derec
-0.14
heed
-0.14
Forgot
-0.13
ignorance
-0.13
shorten
-0.13
POSITIVE LOGITS
slow
0.54
slower
0.50
slowed
0.49
Slow
0.49
Slow
0.47
slow
0.46
delay
0.44
Delay
0.44
delay
0.44
Delay
0.44
Activations Density 0.017%