INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ansiedad
    0.45
     불안
    0.43
    شرين
    0.43
     Equestrian
    0.43
    스포츠
    0.40
     ada
    0.38
     anxiety
    0.38
    ائف
    0.38
     Aug
    0.38
     Lund
    0.38
    POSITIVE LOGITS
    ovol
    0.41
     overcome
    0.38
     wilt
    0.37
    opic
    0.37
    cob
    0.37
    ωνα
    0.37
    ouro
    0.37
    owls
    0.37
     prover
    0.37
     වී
    0.36
    Act Density 0.001%

    No Known Activations