INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ،
    0.75
    0.73
    ly
    0.71
    0.69
     can
    0.68
     scouring
    0.66
    0.64
    ۴
    0.64
    angs
    0.64
    るので
    0.64
    POSITIVE LOGITS
    Concept
    1.04
    CONCEPT
    1.02
     koncept
    0.96
     idea
    0.94
    概念
    0.93
     Concept
    0.91
    concept
    0.91
     concept
    0.89
    0
    0.88
     CONCEPT
    0.88
    Act Density 0.079%

    No Known Activations