INDEX
    Explanations

    phrases that indicate limitations or restrictions in various contexts

    New Auto-Interp
    Negative Logits
    /misc
    -0.15
    urd
    -0.15
    urch
    -0.14
     partly
    -0.14
     misc
    -0.13
    лон
    -0.13
    ittel
    -0.13
     Stall
    -0.13
    ̣
    -0.13
    935
    -0.12
    POSITIVE LOGITS
     only
    0.72
     ONLY
    0.65
     limited
    0.64
    only
    0.62
     restricted
    0.58
    Only
    0.57
    limited
    0.57
    ONLY
    0.56
     Only
    0.56
     confined
    0.54
    Act Density 0.462%

    No Known Activations