INDEX
    Explanations

    alternatives or instead

    New Auto-Interp
    Negative Logits
     mysticism
    0.45
     confusion
    0.44
    confusion
    0.40
    OCUS
    0.40
    corruption
    0.39
     hypocrisy
    0.38
     efficace
    0.38
     corruption
    0.38
    chaos
    0.38
     अनियमित
    0.38
    POSITIVE LOGITS
    代わりに
    0.81
     вместо
    0.80
     instead
    0.80
    替代
    0.77
    代替
    0.75
     substitutions
    0.73
     замены
    0.71
     substitution
    0.70
    取代
    0.70
     Instead
    0.70
    Act Density 0.159%

    No Known Activations