INDEX
    Explanations

    Catalan, Polish, Russian phrases

    New Auto-Interp
    Negative Logits
    on
    1.16
    x
    0.92
    and
    0.89
    o
    0.86
    ot
    0.86
    era
    0.84
    os
    0.83
    il
    0.79
    OS
    0.78
    U
    0.78
    POSITIVE LOGITS
    נו
    0.89
    ה
    0.86
     accompl
    0.78
     authorizes
    0.78
    ות
    0.75
    יו
    0.75
     getters
    0.74
     assaulted
    0.73
     execut
    0.72
     expenditures
    0.72
    Act Density 0.001%

    No Known Activations