INDEX
    Explanations

    phrases indicating comparisons or examples

    New Auto-Interp
    Negative Logits
    istical
    -0.83
    idate
    -0.78
    fixed
    -0.77
    olate
    -0.76
    gap
    -0.76
    ettlement
    -0.76
    nown
    -0.75
    iminary
    -0.75
    enser
    -0.72
    ensibly
    -0.72
    POSITIVE LOGITS
     Louie
    0.92
     Franz
    0.89
     Forrest
    0.88
     Alfred
    0.86
     Jasper
    0.86
     Sergio
    0.86
     Clive
    0.86
     Clint
    0.85
     Cowboy
    0.85
     Leonardo
    0.84
    Act Density 0.071%

    No Known Activations