INDEX
    Explanations

    prepositional phrase indicators

    New Auto-Interp
    Negative Logits
     a
    0.61
    ו
    0.60
    ów
    0.57
    erson
    0.56
    was
    0.54
    ids
    0.54
    art
    0.53
    V
    0.52
     was
    0.51
    verts
    0.51
    POSITIVE LOGITS
     in
    0.77
    ين
    0.68
    daki
    0.52
     인해
    0.52
    ని
    0.50
    larından
    0.50
    0.49
    larını
    0.49
    0.48
     في
    0.48
    Act Density 2.344%

    No Known Activations