INDEX
    Explanations

    matplotlib plotting code

    New Auto-Interp
    Negative Logits
     cripp
    -0.08
    Film
    -0.07
    queen
    -0.07
    _MET
    -0.07
     Wor
    -0.06
    arse
    -0.06
     Commander
    -0.06
    _AUTH
    -0.06
     Barney
    -0.06
     vocabulary
    -0.06
    POSITIVE LOGITS
     taxed
    0.07
    sında
    0.07
     článek
    0.06
     paired
    0.06
    <pre
    0.06
    SmartyHeaderCode
    0.06
     crumbling
    0.06
    brates
    0.06
     (((
    0.06
    :int
    0.06
    Act Density 0.075%

    No Known Activations