INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    iding
    0.29
    Protection
    0.28
    ن
    0.28
     =
    0.28
    ствует
    0.28
    0.27
    0.27
    as
    0.26
    م
    0.26
    的基础上
    0.25
    POSITIVE LOGITS
     virtue
    0.79
     dint
    0.58
     means
    0.53
    zantine
    0.52
     mistake
    0.44
     Virtue
    0.43
     necessity
    0.43
    0.41
    means
    0.38
     virtues
    0.36
    Act Density 0.074%

    No Known Activations