INDEX
    Explanations

    examples or instances of a concept or idea

    phrases indicating examples or instances of concepts

    New Auto-Interp
    Negative Logits
    ement
    -0.76
    izons
    -0.73
    ancies
    -0.73
     wig
    -0.69
    ossier
    -0.66
    LD
    -0.65
    ulum
    -0.65
     Debor
    -0.65
    iets
    -0.64
     Tours
    -0.64
    POSITIVE LOGITS
     collateral
    0.78
     fut
    0.76
     plagiar
    0.76
     unintended
    0.72
     guiActiveUnfocused
    0.72
     heroism
    0.71
     tropes
    0.70
     examples
    0.68
     pitfalls
    0.68
     redeem
    0.65
    Act Density 0.094%

    No Known Activations