INDEX
    Explanations

    actions ending in ation

    New Auto-Interp
    Negative Logits
    ના
    0.75
    د
    0.72
    ยัง
    0.64
    0.63
     muut
    0.62
    I
    0.62
    ión
    0.61
    માં
    0.61
    یت
    0.61
    ことを
    0.60
    POSITIVE LOGITS
    is
    1.02
    ти
    0.98
    x
    0.94
    т
    0.94
    ר
    0.89
    н
    0.83
    tr
    0.82
    us
    0.81
    n
    0.81
    ت
    0.81
    Act Density 0.156%

    No Known Activations