INDEX
    Explanations

    mentions or references to people in various contexts

    references to people and their opinions or behaviors

    New Auto-Interp
    Negative Logits
     srfAttach
    -0.77
    ãĥ¯
    -0.71
    éŃĶ
    -0.67
    paralleled
    -0.66
    inth
    -0.66
    yx
    -0.64
     Rhodes
    -0.64
    actory
    -0.63
     predecessor
    -0.63
    orthy
    -0.62
    POSITIVE LOGITS
     underestimate
    0.99
     clam
    0.97
     underest
    0.96
     misunderstanding
    0.91
     misunderstand
    0.91
     dying
    0.89
     noticing
    0.89
     flock
    0.88
     afraid
    0.87
     hating
    0.87
    Act Density 0.231%

    No Known Activations