INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     trophies
    -0.08
    elerinin
    -0.08
     collection
    -0.07
     Wald
    -0.07
    iscard
    -0.07
    .favorite
    -0.07
     statt
    -0.07
     Vernon
    -0.07
     сот
    -0.07
    .attribute
    -0.07
    POSITIVE LOGITS
    js
    0.17
    JS
    0.15
     JS
    0.15
    -js
    0.12
    (JS
    0.12
     js
    0.11
    /js
    0.10
    Js
    0.10
    	js
    0.09
    _js
    0.09
    Act Density 0.008%

    No Known Activations