INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    f
    0.61
    c
    0.59
    9
    0.57
    losti
    0.54
    3
    0.51
    6
    0.50
     refus
    0.49
    5
    0.48
     לאחר
    0.48
    fy
    0.46
    POSITIVE LOGITS
     Leaving
    1.04
     оставля
    0.99
     laissé
    0.95
     leaving
    0.94
     leave
    0.93
     Leave
    0.92
    leaving
    0.90
     laissent
    0.89
    留下
    0.86
     laisse
    0.80
    Act Density 0.123%

    No Known Activations