INDEX
    Explanations

    phrases indicating ongoing problems or unresolved issues

    New Auto-Interp
    Negative Logits
     Rough
    -0.17
    xef
    -0.16
    frau
    -0.15
    vere
    -0.14
    urette
    -0.14
    ccb
    -0.14
    riba
    -0.14
    eru
    -0.14
    erner
    -0.13
     leh
    -0.13
    POSITIVE LOGITS
     still
    0.25
     Still
    0.21
    still
    0.19
    è¿ĺæĺ¯
    0.19
    Still
    0.18
     STILL
    0.17
     ainda
    0.17
    olland
    0.16
    -ie
    0.15
    .gg
    0.15
    Act Density 0.249%

    No Known Activations