INDEX
    Explanations

    adverbs of manner and extent

    New Auto-Interp
    Negative Logits
     of
    0.69
     ruas
    0.63
     to
    0.62
     ahli
    0.60
    0.59
     luas
    0.55
    0.55
     keber
    0.55
     akses
    0.55
     neuen
    0.55
    POSITIVE LOGITS
    에서도
    0.47
    0.47
    에도
    0.46
    М
    0.44
    पणे
    0.43
    7
    0.41
    0.41
    ،
    0.40
    С
    0.40
    they
    0.39
    Act Density 0.502%

    No Known Activations