INDEX
    Explanations

    phrases indicating a strong emotional reaction or emphasis

    expressions of comparison or similarity

    New Auto-Interp
    Negative Logits
    alogue
    -0.75
    hiba
    -0.73
    arers
    -0.72
    atform
    -0.70
    tein
    -0.70
    Versions
    -0.69
    FL
    -0.68
    Lower
    -0.67
    ircraft
    -0.67
    verning
    -0.66
    POSITIVE LOGITS
    lihood
    0.94
    lier
    0.80
    liest
    0.77
     crazy
    0.77
     wow
    0.70
    liness
    0.69
    ably
    0.68
     goddamn
    0.65
     parity
    0.65
     crap
    0.64
    Act Density 0.069%

    No Known Activations