INDEX
Explanations
abstract concepts and descriptions
New Auto-Interp
Negative Logits
the
0.60
be
0.42
1
0.41
)
0.41
to
0.41
for
0.40
or
0.40
that
0.38
User
0.38
can
0.38
POSITIVE LOGITS
meriye
0.45
ແລະ
0.42
револю
0.41
4
0.41
۵
0.41
alegría
0.40
искусства
0.40
៥
0.40
REGIUNI
0.40
vykor
0.40
Activations Density 0.072%