INDEX
    Explanations

    possessive pronouns

    New Auto-Interp
    Negative Logits
    -opacity
    -0.07
    -goal
    -0.07
    =float
    -0.07
    =bool
    -0.06
    хож
    -0.06
    策划
    -0.06
    ייב
    -0.06
     Besch
    -0.06
    -0.06
    _LA
    -0.06
    POSITIVE LOGITS
     assembling
    0.08
     lantern
    0.08
     Bureau
    0.07
    parate
    0.07
    .Typed
    0.07
     interpreting
    0.07
    здание
    0.07
    ffer
    0.07
    Utility
    0.07
     cop
    0.07
    Act Density 0.004%

    No Known Activations