INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     walls
    -0.09
     envol
    -0.09
     murs
    -0.08
     salle
    -0.08
     combin
    -0.08
    -0.08
    -0.08
     саме
    -0.08
     arf
    -0.08
     distracted
    -0.08
    POSITIVE LOGITS
    Excluded
    0.09
    -query
    0.08
    (query
    0.08
    етим
    0.08
    _query
    0.08
     Query
    0.08
    =query
    0.08
    _QUERY
    0.08
     excluded
    0.08
    .query
    0.08
    Act Density 0.015%

    No Known Activations