INDEX
    Explanations

    code syntax and programming constructs

    New Auto-Interp
    Negative Logits
    (__('
    -0.67
     ['$
    -0.61
     Weiss
    -0.61
     '@/
    -0.61
     Appel
    -0.61
     Hess
    -0.60
    éte
    -0.59
     FontWeight
    -0.58
    jina
    -0.58
    lobo
    -0.58
    POSITIVE LOGITS
    1.27
    </tr>
    1.13
    ↵↵
    1.05
    <eos>
    0.99
    [toxicity=0]
    0.88
    ↵↵↵
    0.88
    ↵↵↵↵↵
    0.84
    ↵↵↵↵
    0.79
    ())))
    0.77
    ↵↵↵↵↵↵
    0.76
    Act Density 0.451%

    No Known Activations