INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .ticket
    -0.07
    throws
    -0.06
    ierarchy
    -0.06
    _dirs
    -0.06
    jaw
    -0.06
     axe
    -0.06
    وران
    -0.06
     familia
    -0.06
    ifestyle
    -0.06
     marriage
    -0.06
    POSITIVE LOGITS
    .↵↵↵↵↵↵↵↵
    0.08
     Func
    0.07
     процес
    0.07
    YNAM
    0.06
    };
    ↵
    ↵
    ↵
    0.06
    нообраз
    0.06
     debunk
    0.06
    esk
    0.06
    0.06
    .↵↵↵↵↵
    0.06
    Act Density 0.025%

    No Known Activations