INDEX
    Explanations

    phrases related to making things clear or emphasizing specific points

    New Auto-Interp
    Negative Logits
     horizont
    -0.70
     Niet
    -0.69
    rought
    -0.60
     exceeded
    -0.56
    Root
    -0.56
    ridor
    -0.56
    verages
    -0.56
    cheon
    -0.55
    athered
    -0.55
    rity
    -0.55
    POSITIVE LOGITS
     noises
    0.83
     mistake
    0.81
     disappear
    0.81
     happen
    0.81
     debut
    0.79
     impression
    0.79
    sense
    0.78
    ends
    0.76
     noise
    0.76
     contribution
    0.73
    Act Density 0.721%

    No Known Activations