INDEX
    Explanations

    the word "so" indicating a logical conclusion or continuation in a sentence

    New Auto-Interp
    Negative Logits
     Cree
    -0.67
     Neigh
    -0.64
    ropolitan
    -0.60
     Kids
    -0.59
     burg
    -0.59
     Dre
    -0.58
    WN
    -0.58
    gie
    -0.57
     Tact
    -0.57
     Milan
    -0.57
    POSITIVE LOGITS
     forth
    1.45
    forth
    1.08
    bered
    1.03
    othe
    1.01
    oths
    0.98
    apy
    0.97
    ooo
    0.88
    oooo
    0.86
    oner
    0.83
     far
    0.81
    Act Density 0.026%

    No Known Activations