INDEX
    Explanations

    mentions of specific names and references in a text

    New Auto-Interp
    Negative Logits
    neau
    -0.17
    ament
    -0.15
    oks
    -0.15
     Olsen
    -0.15
    827
    -0.14
    896
    -0.14
    argo
    -0.14
    ernen
    -0.14
    923
    -0.13
     Willow
    -0.13
    POSITIVE LOGITS
    chied
    0.16
    ugar
    0.16
    ستگÛĮ
    0.15
     سب
    0.15
    elman
    0.14
    éѝ
    0.14
    оди
    0.14
    UED
    0.14
    é²ģ
    0.14
    ῦ
    0.14
    Act Density 0.021%

    No Known Activations