INDEX
    Explanations

    references to individuals in leadership positions

    New Auto-Interp
    Negative Logits
    <bos>
    -0.49
     hobby
    -0.43
     Kobayashi
    -0.42
     minat
    -0.40
    box
    -0.40
     roommate
    -0.40
     illeg
    -0.40
    lib
    -0.40
     mayhem
    -0.40
     box
    -0.39
    POSITIVE LOGITS
     leaders
    1.65
    Leaders
    1.59
     Leaders
    1.59
    leaders
    1.48
     líderes
    1.14
     pemimpin
    0.93
     Leadership
    0.87
     liderança
    0.87
    Leadership
    0.85
     dirigeants
    0.85
    Act Density 0.006%

    No Known Activations