INDEX
    Explanations

    relative clauses and specific entities

    New Auto-Interp
    Negative Logits
    c
    0.91
    1
    0.84
    на
    0.84
    2
    0.75
    ت
    0.75
    b
    0.74
    g
    0.73
    ्री
    0.73
    ة
    0.71
    م
    0.68
    POSITIVE LOGITS
     Bedingungen
    0.93
    🕜
    0.88
     ľud
    0.88
     obat
    0.87
     nedenle
    0.84
    📪
    0.84
    0.83
     provinsi
    0.82
     njia
    0.82
     giants
    0.82
    Act Density 1.573%

    No Known Activations