INDEX
    Explanations

    phrases indicating a series of examples or a list

    phrases indicating the existence or presence of stories and facts

    New Auto-Interp
    Negative Logits
    culosis
    -0.78
    nesday
    -0.78
    oire
    -0.78
    iliation
    -0.72
    etheless
    -0.72
    aea
    -0.71
    oyer
    -0.70
    ility
    -0.70
    icka
    -0.69
    ãĤ¨ãĥ«
    -0.69
    POSITIVE LOGITS
    types
    0.91
     examples
    0.89
     truths
    0.87
     constants
    0.82
     facts
    0.81
     guys
    0.80
     kinds
    0.79
     caveats
    0.78
     types
    0.77
     topics
    0.76
    Act Density 0.064%

    No Known Activations