INDEX
Explanations
references to the term "hop" and its variations
New Auto-Interp
Negative Logits
tte
-0.20
hand
-0.19
divider
-0.18
aments
-0.17
abs
-0.16
ament
-0.15
um
-0.15
cé
-0.15
ature
-0.15
ummies
-0.15
POSITIVE LOGITS
Hop
0.24
itals
0.23
kins
0.23
Hop
0.23
-hop
0.22
py
0.22
hop
0.21
portunity
0.20
lamaz
0.19
ital
0.19
Activations Density 0.010%