INDEX
    Explanations

    false beliefs and self-criticism

    New Auto-Interp
    Negative Logits
     практически
    0.85
     также
    0.81
     максимально
    0.79
    ань
    0.77
     લગભગ
    0.76
    developer
    0.76
    стный
    0.75
     myös
    0.75
     számos
    0.74
    ustan
    0.74
    POSITIVE LOGITS
     inferiority
    1.06
     disbelief
    0.98
     justifies
    0.93
     wrongdoing
    0.93
     homosexuality
    0.93
     untrue
    0.92
     superiority
    0.91
     wrongly
    0.89
     beliefs
    0.89
     falsely
    0.89
    Act Density 0.093%

    No Known Activations