INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     purpoſe
    -0.88
     enfans
    -0.87
    ſelf
    -0.85
     perſon
    -0.82
     myſelf
    -0.82
     pleaſure
    -0.80
     ainfi
    -0.79
     ſever
    -0.79
     itſelf
    -0.79
     preſent
    -0.76
    POSITIVE LOGITS
     aus
    1.26
     uit
    0.76
    aus
    0.67
     из
    0.65
     from
    0.64
     FROM
    0.54
     out
    0.54
     From
    0.52
    from
    0.51
     Crom
    0.49
    Act Density 0.001%

    No Known Activations