INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Positive
    -0.08
    老師
    -0.08
     Dej
    -0.07
    afna
    -0.07
    tempor
    -0.07
     metro
    -0.07
    -0.07
    ctica
    -0.07
     Temporary
    -0.07
     profesor
    -0.07
    POSITIVE LOGITS
     GAM
    0.09
     planes
    0.08
    0.08
     plane
    0.08
    707
    0.08
     manifold
    0.08
     rooftop
    0.07
     fooled
    0.07
    _sid
    0.07
    _planes
    0.07
    Act Density 0.009%

    No Known Activations