INDEX
    Explanations

    terms related to distinctions, relationships, and differences between concepts

    New Auto-Interp
    Negative Logits
     Barg
    -0.17
    alles
    -0.16
    Ľi
    -0.15
     Buchanan
    -0.15
    avec
    -0.15
    stor
    -0.15
    dG
    -0.15
    olle
    -0.15
     Bias
    -0.15
     Bloc
    -0.14
    POSITIVE LOGITS
     bet
    0.42
     bew
    0.37
    bet
    0.32
     bt
    0.31
     btw
    0.29
     b
    0.27
     bw
    0.27
     Bet
    0.24
     be
    0.24
     beet
    0.24
    Act Density 0.100%

    No Known Activations