INDEX
    Explanations

    instances of significant physical interactions and emotional responses

    New Auto-Interp
    Negative Logits
     Yourself
    -0.44
     yourself
    -0.42
     yourselves
    -0.41
     himself
    -0.39
     myself
    -0.37
     Himself
    -0.36
     ourselves
    -0.36
     herself
    -0.33
     themselves
    -0.32
    him
    -0.28
    POSITIVE LOGITS
     seu
    0.47
     sua
    0.42
     his
    0.42
     seus
    0.39
     suas
    0.37
     suo
    0.35
     her
    0.34
     their
    0.32
    她çļĦ
    0.31
     seine
    0.31
    Act Density 0.119%

    No Known Activations