INDEX
    Explanations

    phrases that express freedom and permissiveness

    New Auto-Interp
    Negative Logits
    resh
    -0.18
    glob
    -0.15
    resa
    -0.15
     touch
    -0.14
    finished
    -0.14
     ung
    -0.14
    aris
    -0.14
     everlasting
    -0.14
    track
    -0.14
     Manufacturers
    -0.13
    POSITIVE LOGITS
    ehir
    0.19
    tsky
    0.16
    UDA
    0.15
    ãĥ³ãĥĸ
    0.15
    oes
    0.15
    acket
    0.14
     pressure
    0.14
    -automatic
    0.14
    ÙĪØ§Ùĩ
    0.14
    oa
    0.14
    Act Density 0.247%

    No Known Activations