INDEX
Explanations
references to video game elements and comparisons to popular culture
New Auto-Interp
Negative Logits
å¦Ĥ
-0.16
åĥı
-0.16
like
-0.15
ooke
-0.15
wie
-0.14
wu
-0.14
oted
-0.14
ži
-0.13
Pare
-0.13
Ãłnh
-0.13
POSITIVE LOGITS
except
0.55
except
0.49
Except
0.47
Except
0.45
minus
0.45
minus
0.40
_except
0.35
Minus
0.31
except
0.31
váºŃy
0.27
Activations Density 0.275%