INDEX
    Explanations

    phrases that indicate various types of actions or responses related to lists or categorization

    New Auto-Interp
    Negative Logits
     mostly
    -0.19
     вÑģеÑħ
    -0.19
     generally
    -0.18
    éĢļ常
    -0.18
    ä½ķãģĭ
    -0.18
    mostly
    -0.18
     always
    -0.17
     vÄĽtÅ¡inou
    -0.17
     largely
    -0.17
     Mostly
    -0.17
    POSITIVE LOGITS
     even
    0.40
     simply
    0.35
    even
    0.34
     sogar
    0.34
     outright
    0.32
    çĶļèĩ³
    0.32
     downright
    0.31
     dokonce
    0.29
     Even
    0.27
    Even
    0.27
    Act Density 0.490%

    No Known Activations