INDEX
Explanations
occurrences of the word "up" and its variations
New Auto-Interp
Negative Logits
itſelf
-0.99
themſelves
-0.97
pleaſure
-0.91
himſelf
-0.86
Efq
-0.85
neſs
-0.84
obſ
-0.84
myſelf
-0.83
whoſe
-0.83
ſeveral
-0.83
POSITIVE LOGITS
down
0.53
&___
0.52
front
0.51
EndContext
0.49
gra
0.48
labelledby
0.47
Down
0.47
down
0.46
dat
0.44
loader
0.43
Activations Density 0.060%