INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     beaches
    -0.07
    imler
    -0.07
     functionalities
    -0.07
    iggs
    -0.06
    amping
    -0.06
     functionality
    -0.06
     starch
    -0.06
    uning
    -0.06
     proto
    -0.06
    ou
    -0.06
    POSITIVE LOGITS
     ̄ ̄ ̄
    0.07
    Bob
    0.07
     أحد
    0.06
     потрап
    0.06
    ++;
    0.06
    ...↵↵↵↵↵↵
    0.06
    0.06
    MenuBar
    0.06
     meslek
    0.06
     Cannot
    0.06
    Act Density 0.037%

    No Known Activations