INDEX
    Explanations

    phrases and words that indicate analysis or critical evaluation of situations

    New Auto-Interp
    Negative Logits
    habi
    -0.15
    rane
    -0.15
    seys
    -0.14
    erland
    -0.14
    ral
    -0.14
    rani
    -0.14
     видÑĥ
    -0.14
     OTHERWISE
    -0.14
    .obtain
    -0.14
    εί
    -0.13
    POSITIVE LOGITS
     again
    0.60
    again
    0.52
     Again
    0.51
    Again
    0.48
     AGAIN
    0.41
    åıĪ
    0.40
    AGAIN
    0.39
    _again
    0.37
     wieder
    0.35
     novamente
    0.35
    Act Density 0.025%

    No Known Activations