INDEX
    Explanations

    indicators related to discussions of interventions and analysis

    New Auto-Interp
    Negative Logits
    AndEndTag
    -0.92
    +#+#
    -0.82
    WriteTagHelper
    -0.78
     CreateTagHelper
    -0.75
     BoxFit
    -0.74
     JpaRepository
    -0.73
    AddTagHelper
    -0.72
    aniline
    -0.71
    rrggbb
    -0.71
    новништво
    -0.70
    POSITIVE LOGITS
    W
    0.46
     umane
    0.44
    A
    0.44
    <strong>
    0.42
    OrNil
    0.41
    ↵↵
    0.41
    [toxicity=0]
    0.41
    I
    0.41
    As
    0.41
     esser
    0.41
    Act Density 0.958%

    No Known Activations