INDEX
    Explanations

    ular, secular

    New Auto-Interp
    Negative Logits
     sunny
    -0.08
     cone
    -0.07
     الأ
    -0.07
     unclear
    -0.07
    老师
    -0.07
    visor
    -0.07
    -0.07
     sofa
    -0.07
     да
    -0.07
    ونکي
    -0.07
    POSITIVE LOGITS
    /non
    0.08
    -minded
    0.08
     justice
    0.08
    _attrs
    0.08
     Warfare
    0.08
     власти
    0.07
    дік
    0.07
     encycl
    0.07
    mein
    0.07
    力量
    0.07
    Act Density 0.004%

    No Known Activations