INDEX
    Explanations

    affirmative responses leading to explanations or actions

    New Auto-Interp
    Negative Logits
    inghouse
    0.97
    rég
    0.96
    ский
    0.92
    regulated
    0.89
    ح
    0.87
     DirectX
    0.84
    rales
    0.84
    suz
    0.84
    sächlich
    0.84
     prismatic
    0.84
    POSITIVE LOGITS
    ای
    1.13
    为了
    1.03
    特点
    1.02
    1.00
    أي
    0.91
    ă
    0.91
    και
    0.90
    0.90
    0.87
    表达
    0.87
    Act Density 0.000%

    No Known Activations