INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gar
    -0.08
    Wa
    -0.07
     BB
    -0.07
     turbulence
    -0.07
    _dims
    -0.07
     adjacency
    -0.07
    ty
    -0.07
    Kol
    -0.07
    Lam
    -0.07
    EP
    -0.07
    POSITIVE LOGITS
    -orang
    0.09
    gående
    0.08
     gero
    0.08
     mined
    0.08
    дущ
    0.08
     ger
    0.08
    gef
    0.08
     بدء
    0.08
     bron
    0.08
     водитель
    0.08
    Act Density 0.041%

    No Known Activations