INDEX
    Explanations

    elements related to historical events and backgrounds of individuals, particularly in academia or sports

    New Auto-Interp
    Negative Logits
     were
    -0.25
     Were
    -0.22
    Were
    -0.22
     weren
    -0.21
    were
    -0.20
    šli
    -0.16
     waren
    -0.16
     Booster
    -0.16
     Ñģказ
    -0.15
    itals
    -0.15
    POSITIVE LOGITS
    ierte
    0.35
    gte
    0.34
    igte
    0.33
    pte
    0.30
    te
    0.29
    erte
    0.28
    zte
    0.28
    nte
    0.27
    kte
    0.27
    onte
    0.27
    Act Density 0.025%

    No Known Activations