INDEX
    Explanations

    phrases or sentences enclosed within quotation marks

    quotation marks and their adjacent content

    New Auto-Interp
    Negative Logits
     rall
    -0.63
     Azerb
    -0.62
     derby
    -0.61
     destro
    -0.60
     affiliate
    -0.59
     seasoned
    -0.59
     adjud
    -0.57
     sympath
    -0.55
     quartz
    -0.55
     rul
    -0.55
    POSITIVE LOGITS
    SELECT
    0.93
    WHERE
    0.83
    false
    0.82
    Dear
    0.81
    too
    0.80
    Hello
    0.80
    WE
    0.79
    Hey
    0.79
    smart
    0.78
    dist
    0.78
    Act Density 0.121%

    No Known Activations