INDEX
Explanations
instances of the word "down" in various contexts
New Auto-Interp
Negative Logits
ador
-0.17
ocities
-0.16
tings
-0.15
lÃłng
-0.15
xd
-0.15
ings
-0.15
Derrick
-0.14
quo
-0.14
ted
-0.14
adores
-0.14
POSITIVE LOGITS
playing
0.31
plays
0.30
play
0.30
played
0.28
PLAY
0.24
graded
0.22
grading
0.22
grades
0.21
-play
0.21
plays
0.20
Activations Density 0.018%