INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     It
    0.91
    ции
    0.75
    𝘴
    0.61
    шения
    0.61
    нд
    0.60
    ру
    0.60
    0.59
    шение
    0.59
    ן
    0.58
    ных
    0.57
    POSITIVE LOGITS
    et
    0.98
     riesgos
    0.95
    al
    0.94
     risks
    0.93
    7
    0.90
     risk
    0.89
    ak
    0.86
     खतरा
    0.86
     riesgo
    0.84
    or
    0.83
    Act Density 0.137%

    No Known Activations