INDEX
    Explanations

    attends to tokens related to "good" from tokens related to "fair."

    New Auto-Interp
    Head Attr Weights
    0:0.17
    1:0.15
    2:0.08
    3:0.06
    4:0.08
    5:0.04
    6:0.12
    7:0.25
    Negative Logits
    +:+
    -0.52
    RunAsync
    -0.50
    UnsafeEnabled
    -0.49
    AndroidJUnit
    -0.49
    InjectAttribute
    -0.49
    AnchorStyles
    -0.46
    )_/¯
    -0.45
    rrggbb
    -0.44
    verwijspagina
    -0.43
     للاسماء
    -0.42
    POSITIVE LOGITS
     δή
    0.31
     was
    0.31
    annis
    0.29
     is
    0.29
    iser
    0.28
    amat
    0.28
    aso
    0.28
     pia
    0.28
    Edel
    0.28
     zł
    0.27
    Act Density 0.304%

    No Known Activations