INDEX
    Explanations

    references to specific locations and relationships

    New Auto-Interp
    Negative Logits
    patch
    -0.19
     patch
    -0.18
     Patch
    -0.15
    rouw
    -0.15
     Ord
    -0.15
     Arr
    -0.15
    Patch
    -0.14
     Hoy
    -0.14
    еÑĢед
    -0.14
    afen
    -0.14
    POSITIVE LOGITS
     nail
    0.18
     nose
    0.18
     ima
    0.18
     дол
    0.18
     rodi
    0.17
     se
    0.17
     може
    0.17
     деле
    0.17
     mo
    0.17
     нал
    0.16
    Act Density 0.002%

    No Known Activations