INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    -0.08
    Listener
    -0.08
     turno
    -0.07
    Broad
    -0.07
    reads
    -0.07
     eh
    -0.07
    -0.06
    indi
    -0.06
     usted
    -0.06
    سعد
    -0.06
    POSITIVE LOGITS
     strcat
    0.07
    ophon
    0.07
    0.07
     #%
    0.07
     shaping
    0.06
     anonym
    0.06
     Interaction
    0.06
    _forum
    0.06
    lopen
    0.06
    0.06
    Act Density 0.010%

    No Known Activations