INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.76
    ca
    -0.50
     '
    -0.48
     Imp
    -0.47
    uz
    -0.44
    <eos>
    -0.43
     as
    -0.42
    imp
    -0.42
     ‘
    -0.42
    iz
    -0.42
    POSITIVE LOGITS
    findpost
    0.70
    raszamy
    0.64
     Theſe
    0.62
     poffible
    0.60
    MemoryWarning
    0.60
     myſelf
    0.58
     auroit
    0.56
     étoit
    0.56
     feroit
    0.56
     becauſe
    0.55
    Act Density 2.231%

    No Known Activations