INDEX
    Explanations

    interactions involving offering help or showing empathy

    New Auto-Interp
    Negative Logits
    ')],
    -0.57
    newtheorem
    -0.57
    -->
    
    -0.55
     -->
    
    -0.53
    spesies
    -0.53
    -0.52
     Poste
    -0.49
    cetype
    -0.49
    läufig
    -0.48
    ológ
    -0.48
    POSITIVE LOGITS
    ImageContext
    0.80
    Према
    0.69
     المعيارى
    0.68
     AspNetCore
    0.67
     '\\;'
    0.67
    AndEndTag
    0.65
     stanovnika
    0.64
    setopt
    0.63
    ExtendWith
    0.62
    ✭✭
    0.61
    Act Density 0.506%

    No Known Activations