INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    attribute
    -0.08
    -0.07
    ikipedia
    -0.07
    John
    -0.07
     exploiting
    -0.07
    object
    -0.07
     لتن
    -0.07
     locomotive
    -0.07
    ICY
    -0.07
     نن
    -0.07
    POSITIVE LOGITS
     Insights
    0.10
    Insights
    0.08
     insights
    0.08
    0.08
    截图
    0.08
     arrep
    0.08
     interviews
    0.08
     snapshot
    0.08
     screenshots
    0.07
     Snapshot
    0.07
    Act Density 0.001%

    No Known Activations