INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    then
    -0.78
    now
    -0.61
     now
    -0.59
     then
    -0.57
     lalu
    -0.54
    Then
    -0.53
    herself
    -0.52
    Now
    -0.51
     damaligen
    -0.50
     entonces
    -0.48
    POSITIVE LOGITS
     it
    0.86
     you
    0.86
     they
    0.81
    DockStyle
    0.72
    ,
    0.71
     that
    0.68
     we
    0.66
     there
    0.64
     فريبيس
    0.60
     consider
    0.59
    Act Density 0.037%

    No Known Activations