INDEX
    Explanations

    phrases indicating positive quality or approval

    New Auto-Interp
    Negative Logits
     Sour
    -0.18
    oras
    -0.16
    overs
    -0.15
    oops
    -0.15
    escaping
    -0.14
    atik
    -0.14
    angs
    -0.14
    asaki
    -0.14
    ande
    -0.14
    ented
    -0.13
    POSITIVE LOGITS
    reads
    0.18
    liest
    0.18
    resse
    0.17
    night
    0.16
    bye
    0.16
    lier
    0.15
    owler
    0.15
    æĦıæĢĿ
    0.15
    loe
    0.15
    isy
    0.15
    Act Density 0.057%

    No Known Activations