INDEX
    Explanations

    references to the concept of an "ideal" in various contexts

    New Auto-Interp
    Negative Logits
    eldon
    -0.15
    dale
    -0.15
    assen
    -0.15
     Howard
    -0.15
    kd
    -0.14
    dio
    -0.14
    oran
    -0.14
    ktor
    -0.14
    /how
    -0.14
     than
    -0.13
    POSITIVE LOGITS
    istic
    0.19
    mente
    0.18
    istically
    0.16
    ably
    0.16
    ivil
    0.15
    iminal
    0.15
    cala
    0.15
    conditions
    0.15
    imal
    0.15
    iterals
    0.15
    Act Density 0.031%

    No Known Activations