INDEX
    Explanations

    beginning or initiation phrases in sentences

    New Auto-Interp
    Negative Logits
     research
    -0.54
    WithMany
    -0.53
     people
    -0.52
    רים
    -0.51
    /***/
    -0.49
     sub
    -0.49
     text
    -0.48
     types
    -0.48
     type
    -0.48
    iers
    -0.47
    POSITIVE LOGITS
     propOrder
    0.94
    setVerticalGroup
    0.77
     <=",
    0.77
    новниш
    0.76
     himſelf
    0.75
    IntoConstraints
    0.74
     myſelf
    0.74
     themſelves
    0.73
    RegressionTest
    0.73
    expandindo
    0.73
    Act Density 0.245%

    No Known Activations