INDEX
    Explanations

    phrases indicating the experience of never doing something

    New Auto-Interp
    Negative Logits
     always
    -0.65
     sempre
    -0.58
     toujours
    -0.58
    
    -0.57
    حياته
    -0.56
     constantly
    -0.56
     πάντα
    -0.54
     siempre
    -0.53
     wciąż
    -0.53
     continually
    -0.52
    POSITIVE LOGITS
    theless
    1.06
     again
    0.81
    more
    0.81
     Again
    0.75
     ceases
    0.74
    ending
    0.74
    again
    0.72
    Again
    0.67
    mind
    0.67
     AGAIN
    0.65
    Act Density 0.125%

    No Known Activations