INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ков
    -0.07
     tuyển
    -0.07
    šetření
    -0.07
    -0.07
     kraje
    -0.06
    _center
    -0.06
     deniz
    -0.06
     строитель
    -0.06
    anguage
    -0.06
    ництво
    -0.06
    POSITIVE LOGITS
     logic
    0.10
    Logic
    0.09
     Logic
    0.08
    (Max
    0.07
    logic
    0.07
    .Logic
    0.07
     Context
    0.07
    .logic
    0.07
    Eff
    0.07
     HUD
    0.07
    Act Density 0.009%

    No Known Activations