INDEX
    Explanations

    frequent pronouns and articles in sentences

    New Auto-Interp
    Negative Logits
    s
    -1.07
     Witt
    -0.86
    ness
    -0.81
    ses
    -0.79
     Ej
    -0.74
    nya
    -0.73
    böz
    -0.73
    -0.73
     Assisi
    -0.72
    Haf
    -0.72
    POSITIVE LOGITS
     aDecoder
    0.97
     στη
    0.93
     صوتيه
    0.88
     Bue
    0.86
     detainees
    0.84
     Parke
    0.84
     τη
    0.82
     quoique
    0.81
     Marconi
    0.81
    ΤΗ
    0.81
    Act Density 0.066%

    No Known Activations