INDEX
Explanations
mentions of the word "gold" and related terms
New Auto-Interp
Negative Logits
ãĥ¡ãĥ³ãĥĪ
-0.16
dre
-0.15
ÙĪØ¯ÛĮ
-0.15
dire
-0.14
guise
-0.14
ather
-0.14
apon
-0.14
/dialog
-0.14
dia
-0.14
Tome
-0.14
POSITIVE LOGITS
smith
0.27
finger
0.23
fish
0.22
stein
0.21
reich
0.20
wyn
0.19
mine
0.19
rush
0.19
Nug
0.19
trap
0.18
Activations Density 0.019%