INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     counter
    -0.09
    counter
    -0.08
    _COUNTER
    -0.08
     subtle
    -0.08
     adam
    -0.08
    Nothing
    -0.08
    Counter
    -0.08
    aking
    -0.08
    -0.07
    成年人
    -0.07
    POSITIVE LOGITS
     eigenschappen
    0.09
     ფუნქ
    0.09
     notation
    0.08
     properties
    0.08
     форму
    0.08
     aanwezig
    0.08
     היא
    0.08
     Shelter
    0.08
     자리
    0.08
     בצורה
    0.08
    Act Density 0.012%

    No Known Activations