INDEX
    Explanations

    phrases indicating patterns or common occurrences

    phrases indicating common situations or occurrences

    New Auto-Interp
    Negative Logits
    abases
    -0.72
    istries
    -0.68
    estamp
    -0.67
    enez
    -0.66
    æ©
    -0.65
    bush
    -0.63
    ERSON
    -0.61
    pan
    -0.61
    ema
    -0.61
    oke
    -0.61
    POSITIVE LOGITS
     wont
    0.70
    heter
    0.65
     Hier
    0.64
    fty
    0.61
     [|
    0.59
     Tale
    0.59
    ums
    0.58
     fare
    0.58
     vari
    0.58
    isite
    0.58
    Act Density 0.164%

    No Known Activations