INDEX
    Explanations

    instances where something is demonstrated or illustrated

    New Auto-Interp
    Negative Logits
     brazos
    -0.67
     lèvres
    -0.66
     desastre
    -0.59
    zünd
    -0.59
     kaynağından
    -0.58
     pasillo
    -0.57
     palabra
    -0.57
    ัพท์
    -0.57
     kateg
    -0.56
    SPATH
    -0.56
    POSITIVE LOGITS
     Showing
    1.78
    Showing
    1.74
     SHOWING
    1.71
     showing
    1.69
     Shows
    1.59
    showing
    1.57
    Shows
    1.57
     shown
    1.57
     shows
    1.52
     Shown
    1.51
    Act Density 0.258%

    No Known Activations