INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ى
    1.61
    ă
    1.57
    ка
    1.47
    ні
    1.45
    teil
    1.44
    и
    1.39
    olateral
    1.34
    avatth
    1.34
    ною
    1.34
    1.34
    POSITIVE LOGITS
    1.62
    ל
    1.55
    اء
    1.50
     gerais
    1.48
     optimised
    1.41
     objeto
    1.38
    optimized
    1.38
    1.35
     traje
    1.34
    ار
    1.33
    Act Density 0.001%

    No Known Activations