INDEX
    Explanations

    prepositions followed by determiners

    New Auto-Interp
    Negative Logits
    7
    0.36
    ۶
    0.33
    4
    0.32
    3
    0.32
    6
    0.32
    9
    0.31
    with
    0.31
    5
    0.31
    8
    0.31
    2
    0.29
    POSITIVE LOGITS
     the
    0.74
     this
    0.53
     our
    0.53
    the
    0.49
    The
    0.48
     these
    0.46
     your
    0.45
     The
    0.43
     their
    0.43
    0.41
    Act Density 3.725%

    No Known Activations