INDEX
Explanations
numerical identifiers or references
New Auto-Interp
Negative Logits
rush
-0.15
atcher
-0.15
rary
-0.15
agara
-0.15
endi
-0.15
illa
-0.14
ocha
-0.14
Imper
-0.14
ingham
-0.14
omba
-0.13
POSITIVE LOGITS
na
0.22
&↵
0.20
@show
0.17
@nate
0.17
0.15
HWND
0.15
Na
0.15
na
0.15
NECT
0.15
_emails
0.14
Activations Density 0.002%