INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     öğren
    -0.07
     homosex
    -0.07
     evaluator
    -0.06
    cence
    -0.06
     chiếu
    -0.06
     cười
    -0.06
     						
    -0.06
     environments
    -0.06
    Dice
    -0.06
    eket
    -0.06
    POSITIVE LOGITS
     vitality
    0.10
     Vital
    0.07
     vital
    0.07
    	stats
    0.07
     TLS
    0.07
    bef
    0.06
    OKIE
    0.06
    gfx
    0.06
    สล
    0.06
    Total
    0.06
    Act Density 0.001%

    No Known Activations