INDEX
    Explanations

    references to historical or cultural context and their implications

    New Auto-Interp
    Negative Logits
    è̳
    -0.14
    akeup
    -0.14
    upo
    -0.13
    },"
    -0.13
    lier
    -0.13
    lei
    -0.13
    hid
    -0.13
     Definitely
    -0.13
    inverse
    -0.13
    eneg
    -0.12
    POSITIVE LOGITS
     sort
    0.23
    sort
    0.20
     kind
    0.17
    acon
    0.14
    _kind
    0.14
    æĻ´
    0.14
     actually
    0.14
    bakan
    0.14
    izi
    0.14
     kinds
    0.13
    Act Density 0.001%

    No Known Activations