INDEX
    Explanations

    questions and answers

    New Auto-Interp
    Negative Logits
    -unit
    -0.07
    _member
    -0.07
    quan
    -0.07
     threatened
    -0.07
     whe
    -0.07
    -match
    -0.06
    .interval
    -0.06
     unf
    -0.06
    -0.06
     fingert
    -0.06
    POSITIVE LOGITS
     контра
    0.07
     conservatism
    0.06
     İslâm
    0.06
    .’↵↵
    0.06
     частини
    0.06
    こちら
    0.06
     aşağı
    0.06
     ideologies
    0.06
     квіт
    0.06
    .bb
    0.06
    Act Density 0.079%

    No Known Activations