INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Gus
    -0.07
    ('\\
    -0.06
     cultural
    -0.06
    Tak
    -0.06
    “Well
    -0.06
    -0.06
    -care
    -0.06
    izziness
    -0.06
     altogether
    -0.06
     fertil
    -0.06
    POSITIVE LOGITS
     Diamond
    0.10
     diamond
    0.09
    анта
    0.08
    Diamond
    0.08
    DAC
    0.07
     diplom
    0.07
     Diamonds
    0.07
    мон
    0.06
    ds
    0.06
    	logging
    0.06
    Act Density 0.002%

    No Known Activations