INDEX
    Explanations

    investigate complaints, allegations, deaths

    New Auto-Interp
    Negative Logits
    м
    3.06
    م
    2.78
    مة
    2.02
    ت
    1.94
    ਣੀ
    1.89
    х
    1.87
    י
    1.84
    1.82
    мся
    1.81
    ם
    1.81
    POSITIVE LOGITS
    f
    2.33
    Investigation
    2.03
    at
    1.88
    an
    1.82
    ent
    1.74
    ad
    1.72
    un
    1.70
    ut
    1.70
    ig
    1.69
    اب
    1.65
    Act Density 0.018%

    No Known Activations