INDEX
    Explanations

    was followed by descriptors

    New Auto-Interp
    Negative Logits
    ي
    0.39
    ون
    0.38
    م
    0.36
    0.36
    خستان
    0.33
    و
    0.33
    ام
    0.32
    י
    0.32
    These
    0.31
    0.31
    POSITIVE LOGITS
     a
    0.63
     was
    0.48
     to
    0.47
     I
    0.46
     of
    0.45
     it
    0.44
     by
    0.43
     o
    0.42
    ется
    0.42
     p
    0.41
    Act Density 0.047%

    No Known Activations