INDEX
    Explanations

    the word "so" with varying degrees of emphasis

    New Auto-Interp
    Negative Logits
    theless
    -0.72
    works
    -0.61
    work
    -0.58
     expectancy
    -0.56
     Purg
    -0.55
    amac
    -0.52
     Flavoring
    -0.52
     Halls
    -0.52
     eviction
    -0.51
     Mens
    -0.51
    POSITIVE LOGITS
    bered
    1.31
    oths
    1.26
    apy
    1.21
    othes
    1.18
    oooo
    1.11
    othe
    1.11
    ooo
    1.09
    oner
    1.05
     far
    0.96
    aps
    0.94
    Act Density 0.255%

    No Known Activations