INDEX
    Explanations

    narratives or phrases that reveal surprising outcomes or conclusions

    New Auto-Interp
    Negative Logits
    -0.61
    illoin
    -0.60
    claimer
    -0.59
    ślę
    -0.56
    ztály
    -0.55
     οπο
    -0.55
    harapkan
    -0.55
    amemnon
    -0.52
    ftagPool
    -0.51
    ilosop
    -0.51
    POSITIVE LOGITS
     ternyata
    0.86
     actually
    0.85
    原来
    0.82
    原來
    0.81
     bleek
    0.76
     مشين
    0.74
     Ternyata
    0.73
     blijkt
    0.72
    actually
    0.71
    Actually
    0.71
    Act Density 0.380%

    No Known Activations