INDEX
    Explanations

    actions indicating agency, particularly in contexts of ability or causing events

    New Auto-Interp
    Negative Logits
     are
    -0.64
    сюда
    -0.62
    */
    
    -0.62
     is
    -0.61
     himo
    -0.61
     keeps
    -0.56
     ignores
    -0.55
     makes
    -0.55
     will
    -0.54
    */)
    -0.54
    POSITIVE LOGITS
     wasn
    0.94
    Wasn
    0.88
    Was
    0.87
    wasn
    0.86
    weren
    0.86
     było
    0.86
     Wasn
    0.85
     weren
    0.81
    was
    0.81
     was
    0.79
    Act Density 1.365%

    No Known Activations