INDEX
    Explanations

    breakdown with explanations and steps

    New Auto-Interp
    Negative Logits
     or
    1.12
    1.12
     atau
    1.05
     أو
    1.05
     или
    1.03
     ou
    0.97
    0.96
    or
    0.95
     یا
    0.95
     किंवा
    0.93
    POSITIVE LOGITS
     different
    0.99
     informative
    0.88
     각각
    0.87
     impactful
    0.85
     maximized
    0.83
    的主要
    0.83
    levels
    0.81
     maximum
    0.81
    你需要
    0.80
     разных
    0.80
    Act Density 0.726%

    No Known Activations