INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     здат
    -0.08
     Nelson
    -0.08
    <|end_of_text|>
    -0.07
     dragged
    -0.07
     lem
    -0.07
    -0.07
    WithData
    -0.07
    -volume
    -0.06
     Anadolu
    -0.06
     poco
    -0.06
    POSITIVE LOGITS
     Fair
    0.20
     fair
    0.18
    Fair
    0.16
    fair
    0.12
     fairness
    0.10
     fairly
    0.10
     unfair
    0.09
     unfairly
    0.09
    air
    0.09
     Fairy
    0.08
    Act Density 0.009%

    No Known Activations