INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    </thead>
    -0.85
     Nadel
    -0.79
    สือ
    -0.74
    mopolitan
    -0.72
     />\
    -0.71
     étoile
    -0.70
     Waterman
    -0.70
     dandy
    -0.70
     Navarro
    -0.68
    fohlen
    -0.67
    POSITIVE LOGITS
     Trip
    1.17
     trip
    1.14
    trip
    1.05
    Trip
    1.05
     Trips
    0.98
     trips
    0.93
    trips
    0.91
     TRIP
    0.88
    Trips
    0.85
    TRIP
    0.78
    Act Density 0.018%

    No Known Activations