INDEX
    Explanations

    various forms of the word "reason" and related concepts of purpose and justification

    New Auto-Interp
    Negative Logits
    миÑĢ
    -0.07
    anka
    -0.07
    riel
    -0.06
    ESA
    -0.06
    ç͍çļĦ
    -0.06
    iller
    -0.06
    ikan
    -0.06
     erk
    -0.06
    vit
    -0.06
    jet
    -0.06
    POSITIVE LOGITS
     alone
    0.09
    alone
    0.09
    ä¹ĭä¸Ģ
    0.09
    among
    0.07
    annes
    0.07
     sake
    0.07
     ÙĪØºÙĬر
    0.07
    NAL
    0.07
     among
    0.07
    nement
    0.07
    Act Density 0.003%

    No Known Activations