INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     anxiety
    -0.07
     infiltration
    -0.07
     cote
    -0.07
     rally
    -0.07
     Este
    -0.07
     खो
    -0.07
     intimacy
    -0.07
     novels
    -0.07
     corta
    -0.07
     profunda
    -0.06
    POSITIVE LOGITS
    0.10
    循环
    0.09
    0.08
     polygons
    0.08
     слой
    0.08
    _LAYER
    0.08
    0.08
     cartilage
    0.08
     Hanson
    0.08
    мира
    0.08
    Act Density 0.002%

    No Known Activations