INDEX
    Explanations

    phrases indicating changes in settings or parameters

    Preceding words related to changes in magnitude

    change to or movement to

    New Auto-Interp
    Negative Logits
     AssemblyTitle
    -0.66
    ElementException
    -0.54
     AssemblyCompany
    -0.54
    CONSIN
    -0.50
    GIVEREF
    -0.49
    OrFail
    -0.47
    Чем
    -0.47
    }^{*}(
    -0.44
    dients
    -0.43
    sef
    -0.42
    POSITIVE LOGITS
     to
    1.36
     into
    1.07
    0.99
     إلى
    0.93
    0.92
     menjadi
    0.89
     naar
    0.84
    เหลือ
    0.83
     kepada
    0.82
     到
    0.75
    Act Density 0.565%

    No Known Activations