INDEX
    Explanations

    "What is" or "What's" questions

    New Auto-Interp
    Negative Logits
     contributed
    0.49
     influenced
    0.48
     contributes
    0.47
    Ы
    0.45
     distinguishes
    0.45
     elevates
    0.44
     constitutes
    0.44
     the
    0.43
     differentiates
    0.43
     complicates
    0.42
    POSITIVE LOGITS
     Stakes
    0.43
    धीरे
    0.42
     Wählen
    0.41
     Germans
    0.41
     nouă
    0.40
     indef
    0.39
     बंधन
    0.39
     scegliere
    0.39
     escoger
    0.39
    別人
    0.39
    Act Density 0.006%

    No Known Activations