INDEX
    Explanations

    phrases starting with the word "These"

    New Auto-Interp
    Negative Logits
    Adds
    -0.74
    obar
    -0.70
    rupted
    -0.67
    obook
    -0.66
    let
    -0.66
    mma
    -0.65
    Ģ
    -0.64
    ossier
    -0.64
    ga
    -0.64
    terness
    -0.64
    POSITIVE LOGITS
     guys
    1.31
     aren
    1.29
     are
    1.28
     kinds
    1.16
     weren
    1.13
     days
    1.11
     sorts
    1.06
     dudes
    1.05
     were
    1.05
     fellows
    1.01
    Act Density 0.090%

    No Known Activations