INDEX
Explanations
words related to change or transformation
significant actions or processes related to change or transformation
New Auto-Interp
Negative Logits
Its
-0.67
Bah
-0.65
Cub
-0.65
Its
-0.64
rican
-0.59
believes
-0.58
Bron
-0.58
Diane
-0.57
atform
-0.57
Watch
-0.56
POSITIVE LOGITS
themselves
1.56
prolifer
1.02
their
0.97
individually
0.94
selves
0.94
their
0.89
respectively
0.88
respective
0.87
counterparts
0.87
varying
0.84
Activations Density 0.656%