INDEX
    Explanations

    describing what something is

    New Auto-Interp
    Negative Logits
    分开
    0.35
    0.34
     aident
    0.34
     fraternity
    0.33
     помогут
    0.31
     جلوگیری
    0.31
    0.30
    ուս
    0.30
     nari
    0.30
    0.30
    POSITIVE LOGITS
     contains
    0.79
     occupies
    0.78
     possesses
    0.77
     represents
    0.74
     corresponds
    0.72
     behaves
    0.72
     contain
    0.69
     occupy
    0.68
     consists
    0.66
     behave
    0.65
    Act Density 0.133%

    No Known Activations