INDEX
    Explanations

    words related to safety, protection, or refuge

    the word "haven" in various contexts

    New Auto-Interp
    Negative Logits
    upp
    -0.64
    otype
    -0.63
    ractical
    -0.61
    onel
    -0.61
    activation
    -0.60
    ahon
    -0.60
     thickness
    -0.60
    othy
    -0.60
    ulu
    -0.59
    rophe
    -0.59
    POSITIVE LOGITS
    't
    0.96
    geon
    0.87
    cheon
    0.83
    ned
    0.80
    gotten
    0.79
    tyard
    0.79
    itals
    0.79
    ajor
    0.77
    itarian
    0.75
    edin
    0.74
    Act Density 0.024%

    No Known Activations