INDEX
Explanations
phrases related to notable achievements or features
New Auto-Interp
Negative Logits
a
-0.59
neus
-0.59
oldo
-0.57
tarko
-0.57
two
-0.55
يده
-0.54
sebuah
-0.52
sworn
-0.52
Unary
-0.51
Fiske
-0.51
POSITIVE LOGITS
المعيارى
0.93
paravant
0.81
的一些
0.79
一些
0.77
Himo
0.75
)"),
0.75
enfance
0.74
findall
0.72
SOME
0.72
Tikang
0.72
Activations Density 0.105%