INDEX
    Explanations

    mentions of different languages

    New Auto-Interp
    Negative Logits
    urion
    -0.86
    ecake
    -0.85
    roxy
    -0.82
    kus
    -0.82
    rodu
    -0.81
    arranted
    -0.80
    ilts
    -0.80
    apego
    -0.80
    olls
    -0.80
    romeda
    -0.79
    POSITIVE LOGITS
     spoken
    1.08
     learners
    1.07
    language
    1.00
     language
    0.94
     interpreter
    0.90
    ĨĴ
    0.90
     proficiency
    0.89
     lear
    0.89
    anguage
    0.87
     Languages
    0.86
    Act Density 0.022%

    No Known Activations