INDEX
Explanations
occurrences of the word "hammer" and its variations
New Auto-Interp
Negative Logits
ogue
-0.17
wat
-0.17
Drain
-0.15
Ms
-0.15
Hund
-0.14
ib
-0.14
since
-0.14
urre
-0.14
Fav
-0.14
mon
-0.14
POSITIVE LOGITS
ujet
0.18
heck
0.16
RIPT
0.16
UDO
0.15
------+------+
0.14
positor
0.14
brains
0.14
plu
0.14
manent
0.14
.PropTypes
0.14
Activations Density 0.006%