INDEX
    Explanations

    sensitive themes and difficult emotions

    New Auto-Interp
    Negative Logits
     hints
    0.38
     sneaky
    0.36
     snappy
    0.36
     phishing
    0.35
     deforestation
    0.35
     tweaking
    0.35
     alphabetical
    0.34
     overarching
    0.34
     flashbacks
    0.34
     hikes
    0.34
    POSITIVE LOGITS
     dynamics
    0.41
    Dynamics
    0.40
     realities
    0.38
     אות
    0.36
    feira
    0.35
    liš
    0.35
    dynamics
    0.35
    结论
    0.33
    ਰੇ
    0.33
     proportions
    0.33
    Act Density 0.029%

    No Known Activations