INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    entimes
    -0.89
    inia
    -0.71
    riched
    -0.70
    alone
    -0.63
    ornings
    -0.62
    ocally
    -0.61
    evil
    -0.61
    attery
    -0.60
    venge
    -0.60
    odied
    -0.60
    POSITIVE LOGITS
    iest
    0.98
    liest
    0.96
     portion
    0.90
     section
    0.87
     curve
    0.83
     behind
    0.82
     outline
    0.82
     below
    0.81
     specs
    0.81
     timeline
    0.81
    Act Density 0.327%

    No Known Activations