INDEX
    Explanations

    phrases indicating significant changes or consequences in various contexts

    New Auto-Interp
    Negative Logits
     brakes
    -0.15
    ırak
    -0.14
    itude
    -0.14
    ÙIJÙĬ
    -0.14
    _combined
    -0.14
    éľ
    -0.14
    енÑĮ
    -0.14
     веÑĤ
    -0.13
    tega
    -0.13
     ÙģÙĤ
    -0.13
    POSITIVE LOGITS
    haus
    0.19
    iges
    0.16
    omore
    0.15
    velt
    0.15
    urat
    0.15
    á»įng
    0.15
    aos
    0.15
    á»Ń
    0.15
    ãĤ¦ãĥĪ
    0.14
    aset
    0.14
    Act Density 0.113%

    No Known Activations