INDEX
    Explanations

    languages and language-related terms

    references to languages and multilingual topics

    New Auto-Interp
    Negative Logits
    kefeller
    -0.78
    apor
    -0.76
    romeda
    -0.75
    ramid
    -0.75
    arranted
    -0.74
    Reward
    -0.73
    apego
    -0.73
     horm
    -0.72
    oppable
    -0.71
    rolet
    -0.71
    POSITIVE LOGITS
     languages
    1.60
     language
    1.53
    language
    1.49
     diction
    1.49
     english
    1.41
     Arabic
    1.39
     English
    1.37
    english
    1.34
    English
    1.34
     Hindi
    1.33
    Act Density 0.665%

    No Known Activations