INDEX
    Explanations

    common sentence starters

    introduces explanations or examples

    New Auto-Interp
    Negative Logits
    ला
    0.49
    ra
    0.47
    h
    0.47
    0.46
    نا
    0.46
    ap
    0.45
    ون
    0.44
    l
    0.43
    k
    0.43
    ri
    0.42
    POSITIVE LOGITS
     was
    0.57
     be
    0.51
     had
    0.47
    ة
    0.45
     of
    0.44
     with
    0.44
     avec
    0.42
     is
    0.42
     với
    0.42
     and
    0.41
    Act Density 1.640%

    No Known Activations