INDEX
    Explanations

    references to an authoritative or influential figure, often with a negative connotation of manipulation or deceit

    New Auto-Interp
    Negative Logits
     useStyles
    -0.70
    AnimationsModule
    -0.66
     noDo
    -0.62
     שוליים
    -0.62
    testify
    -0.59
    CppCodeGen
    -0.54
    IntoConstraints
    -0.52
    DropTable
    -0.52
    readyState
    -0.51
    chowa
    -0.50
    POSITIVE LOGITS
    hir
    3.88
    HIR
    1.90
    Hir
    1.77
     hir
    1.72
     Hir
    1.64
    hiri
    1.20
    hira
    1.14
     HIR
    0.95
    heer
    0.89
    hirt
    0.83
    Act Density 0.001%

    No Known Activations