INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .MODEL
    -0.08
    —with
    -0.07
    	scene
    -0.07
     whore
    -0.06
     objected
    -0.06
     flows
    -0.06
     yaptığ
    -0.06
     nymph
    -0.06
     renown
    -0.06
     Š
    -0.06
    POSITIVE LOGITS
     ücretsiz
    0.07
     Straw
    0.06
    _negative
    0.06
     surviv
    0.06
    _intent
    0.06
     autor
    0.06
    Restr
    0.05
     silah
    0.05
    _rights
    0.05
    _aliases
    0.05
    Act Density 0.008%

    No Known Activations