INDEX
    Explanations

    references to specific documents or articles, particularly those that include categorizations, proof, or examples

    New Auto-Interp
    Negative Logits
     telinga
    -0.44
     extranjera
    -0.39
     colectiva
    -0.38
     istrinya
    -0.35
     ibunya
    -0.33
     dolayı
    -0.33
     keluarganya
    -0.33
     turística
    -0.33
     imaginación
    -0.32
     suaminya
    -0.31
    POSITIVE LOGITS
     ſind
    1.19
    ſelf
    1.17
     ſei
    1.14
    featureID
    1.14
    <pad>
    1.13
    <unused43>
    1.12
    <unused42>
    1.11
    <unused41>
    1.11
    <unused8>
    1.10
    <unused23>
    1.10
    Act Density 1.063%

    No Known Activations