INDEX
Explanations
words associated with measuring and analyzing performance or conditions
New Auto-Interp
Negative Logits
the
-0.87
-0.78
can
-0.76
also
-0.76
he
-0.75
have
-0.73
about
-0.73
all
-0.73
it
-0.72
are
-0.71
POSITIVE LOGITS
enfans
0.92
Theſe
0.89
feroit
0.88
Houſe
0.88
pngtree
0.88
ſche
0.88
itſelf
0.87
myſelf
0.86
avoient
0.85
igång
0.84
Activations Density 4.697%