INDEX
    Explanations

    instances of accidental events or mishaps

    New Auto-Interp
    Negative Logits
    shaw
    -0.16
    poon
    -0.15
     unnatural
    -0.15
    hta
    -0.15
    reator
    -0.14
    ameleon
    -0.14
    enco
    -0.14
    ulton
    -0.14
    pmat
    -0.14
    ойно
    -0.14
    POSITIVE LOGITS
     forgot
    0.31
     forget
    0.31
    forgot
    0.28
    å¿ĺ
    0.27
     forgotten
    0.27
     forgetting
    0.26
     accident
    0.26
    forget
    0.25
     Forgot
    0.24
     mis
    0.22
    Act Density 0.272%

    No Known Activations