INDEX
    Explanations

    references to comparisons or similarities between concepts or entities

    New Auto-Interp
    Negative Logits
    anik
    -0.17
    stan
    -0.17
    .Localization
    -0.16
    lsruhe
    -0.14
    unny
    -0.14
    ç«ĭãģ¦
    -0.14
    cts
    -0.14
    oras
    -0.14
    ãĥªãĥ¼ãĤº
    -0.14
    abi
    -0.13
    POSITIVE LOGITS
     напÑĢимеÑĢ
    0.20
     napÅĻÃŃklad
    0.19
    ä¾ĭå¦Ĥ
    0.18
    ä¾ĭ
    0.17
     ÙħØ«ÙĦا
    0.17
     напÑĢиклад
    0.15
     napÅĻ
    0.15
     exemp
    0.15
    ÛĮرÙĩ
    0.15
     quelle
    0.14
    Act Density 0.101%

    No Known Activations