INDEX
    Explanations

    mentions or descriptions of things that are physically lightweight or not heavy

    instances of the word "light" or its associations

    New Auto-Interp
    Negative Logits
    halla
    -0.98
    ettings
    -0.84
    isner
    -0.74
    utor
    -0.73
    OPLE
    -0.73
    apons
    -0.72
    ij士
    -0.72
    OUP
    -0.70
    berus
    -0.69
     Forbidden
    -0.68
    POSITIVE LOGITS
     bulb
    1.21
    hearted
    1.17
    weights
    1.15
    bul
    1.15
    ening
    1.15
    enment
    1.10
     bulbs
    1.09
    lights
    1.04
    heartedly
    1.02
    ener
    0.97
    Act Density 0.027%

    No Known Activations