INDEX
    Explanations

    phrases related to actions or interactions

    the presence of the word "the" and its function in various contexts

    New Auto-Interp
    Negative Logits
    unin
    -0.75
    Indian
    -0.73
    Bal
    -0.72
    hemy
    -0.68
    oin
    -0.67
    ð
    -0.67
    thood
    -0.67
     Maced
    -0.65
    ère
    -0.64
    nir
    -0.63
    POSITIVE LOGITS
    stretched
    0.79
     wrinkles
    0.73
     basics
    0.71
     flyers
    0.69
    OSP
    0.66
     baseline
    0.66
     frustrations
    0.66
    println
    0.61
     reluct
    0.61
     laundry
    0.61
    Act Density 0.269%

    No Known Activations