INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     minority
    0.50
     persuasion
    0.50
     lidí
    0.50
    fal
    0.49
     cultivating
    0.49
     so
    0.48
     fifty
    0.48
     storytelling
    0.48
     psychiatrist
    0.48
     conversar
    0.48
    POSITIVE LOGITS
    й
    0.65
    Islamic
    0.64
    д
    0.60
     Bereiche
    0.57
    доне
    0.56
    łach
    0.56
     ابتد
    0.55
     ஆரம்ப
    0.54
    స్తాయి
    0.54
    olines
    0.54
    Act Density 0.011%

    No Known Activations