INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (
    0.24
    0.21
     (
    0.21
    ;
    0.20
    (),
    0.19
     newline
    0.19
    -
    0.19
    ')
    0.19
    0.19
     რომელი
    0.18
    POSITIVE LOGITS
     beho
    0.29
     isn
    0.28
     είναι
    0.27
     wasn
    0.26
     really
    0.25
     doesn
    0.25
     merupakan
    0.24
     happens
    0.24
     is
    0.24
     involves
    0.24
    Act Density 1.253%

    No Known Activations