INDEX
    Explanations

    language related to decision-making and consequences

    New Auto-Interp
    Negative Logits
    loat
    -0.16
    ovice
    -0.15
    ffa
    -0.15
    nette
    -0.15
    iscard
    -0.14
    برÛĮ
    -0.14
    lettes
    -0.14
    preh
    -0.14
    -initialized
    -0.13
     RegexOptions
    -0.13
    POSITIVE LOGITS
     boil
    0.42
     boils
    0.38
     boiled
    0.38
     boiling
    0.33
     amounts
    0.31
     Bo
    0.30
     reduced
    0.30
    bo
    0.30
    amount
    0.29
     amount
    0.29
    Act Density 0.197%

    No Known Activations