INDEX
    Explanations

    references to singular and plural noun forms in context

    New Auto-Interp
    Negative Logits
    twimg
    -0.92
    OGND
    -0.91
    RegressionTest
    -0.88
    -0.72
     Carn
    -0.72
     CURR
    -0.68
    SequentialGroup
    -0.66
    ########.
    -0.66
     BoxDecoration
    -0.65
    насе
    -0.65
    POSITIVE LOGITS
     um
    2.06
     Um
    1.50
     uma
    1.35
    Um
    1.30
     uh
    1.15
     Uma
    1.11
     umm
    0.94
    Uma
    0.89
     ums
    0.83
     umas
    0.82
    Act Density 0.046%

    No Known Activations