INDEX
    Explanations

    instances of repetition or return phrases in the text

    New Auto-Interp
    Negative Logits
     Thereafter
    -0.46
    attles
    -0.42
    PRS
    -0.42
    seg
    -0.41
    brun
    -0.41
     deshalb
    -0.41
     initState
    -0.41
    -0.41
    culata
    -0.41
    ropho
    -0.41
    POSITIVE LOGITS
    こちらも
    1.20
    bootstrapcdn
    0.99
     again
    0.93
    Again
    0.93
     szint
    0.93
    これも
    0.91
     Again
    0.88
    同样
    0.88
    同樣
    0.87
    again
    0.86
    Act Density 0.478%

    No Known Activations