INDEX
    Explanations

    references to Mahatma Gandhi and related figures

    New Auto-Interp
    Negative Logits
    ENDOR
    -0.17
    onte
    -0.15
    _NATIVE
    -0.15
    entai
    -0.15
    etch
    -0.14
    lh
    -0.14
    argo
    -0.14
    izard
    -0.14
    sec
    -0.14
    orious
    -0.14
    POSITIVE LOGITS
    hil
    0.15
    деÑĢ
    0.14
     Rhodes
    0.14
     Til
    0.14
    Ñĥбли
    0.14
    éϵ
    0.14
    nat
    0.14
     tween
    0.14
    utral
    0.14
    å°¼äºļ
    0.14
    Act Density 0.008%

    No Known Activations