INDEX
Explanations
phrases indicating comparisons or evaluations of things or ideas
New Auto-Interp
Negative Logits
itself
-0.28
was
-0.21
å®ĥ
-0.20
çļĦä¸Ģ个
-0.19
was
-0.18
its
-0.17
Its
-0.16
wasn
-0.15
Its
-0.15
оно
-0.15
POSITIVE LOGITS
themselves
0.45
ones
0.38
examples
0.32
those
0.31
nt
0.30
are
0.30
exceptions
0.29
originals
0.28
reminders
0.28
favorites
0.28
Activations Density 0.487%