INDEX
    Explanations

    the word "fire" or variants of it

    instances of the word "fire."

    New Auto-Interp
    Negative Logits
    ļé
    -0.82
    ĸļ
    -0.82
     carbohyd
    -0.73
    achusetts
    -0.71
     cellul
    -0.70
    Īè
    -0.69
    romeda
    -0.69
     nont
    -0.68
     srf
    -0.68
    ockets
    -0.67
    POSITIVE LOGITS
    nces
    1.00
    cia
    0.90
    lli
    0.86
    ly
    0.86
    lessly
    0.83
    ttes
    0.81
     Dame
    0.79
    zza
    0.79
    les
    0.79
    tto
    0.78
    Act Density 0.014%

    No Known Activations