INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    effect
    -0.08
     Restart
    -0.08
    Effect
    -0.08
     Formatting
    -0.07
    User
    -0.07
     Effect
    -0.07
     Serialization
    -0.07
    ICAg
    -0.07
    _restart
    -0.07
    Retr
    -0.07
    POSITIVE LOGITS
     labeled
    0.12
     anatom
    0.10
     arrows
    0.10
     labeling
    0.10
     labelled
    0.10
     deline
    0.10
     annot
    0.09
     annotate
    0.09
     calles
    0.09
    0.09
    Act Density 0.018%

    No Known Activations