INDEX
    Explanations

    phrases indicating variability or frequency in different contexts

    New Auto-Interp
    Negative Logits
    isObject
    -0.15
    334
    -0.15
     restraint
    -0.14
     :↵↵↵↵
    -0.14
    ir
    -0.14
    ÙĪØº
    -0.14
    ahl
    -0.13
    umer
    -0.13
    yne
    -0.13
    Ī
    -0.13
    POSITIVE LOGITS
    orious
    0.15
    ANCE
    0.15
    ussen
    0.14
    ladu
    0.14
    inton
    0.14
    ISTER
    0.14
    #error
    0.14
    Ŀ
    0.14
     cases
    0.14
    μία
    0.14
    Act Density 0.073%

    No Known Activations