INDEX
    Explanations

    phrases indicating primary factors or main reasons

    New Auto-Interp
    Negative Logits
    Interpreter
    -0.14
    ernote
    -0.14
    TRA
    -0.14
    ÑĤаж
    -0.14
    егоÑĢ
    -0.14
    intel
    -0.13
    hlas
    -0.13
     банкÑĥ
    -0.13
    cel
    -0.13
    .printStackTrace
    -0.13
    POSITIVE LOGITS
     common
    0.22
     thing
    0.21
     things
    0.21
     most
    0.18
    thing
    0.18
     commonly
    0.17
     ninja
    0.17
     ways
    0.16
    Thing
    0.15
     chief
    0.15
    Act Density 0.093%

    No Known Activations