INDEX
    Explanations

    words that define characteristics

    New Auto-Interp
    Negative Logits
    reg
    0.51
    αν
    0.50
    arc
    0.50
    Perspective
    0.49
    Sc
    0.48
    displacement
    0.46
    Cl
    0.46
    excess
    0.46
    a
    0.46
    Islamic
    0.44
    POSITIVE LOGITS
     légende
    0.52
     oraș
    0.49
     поведение
    0.48
    0.46
     आलोचना
    0.45
     toddler
    0.45
     Preston
    0.45
     нова
    0.45
     любые
    0.44
     mieć
    0.44
    Act Density 0.000%

    No Known Activations