INDEX
Explanations
statements related to opinions and preferences in various contexts
New Auto-Interp
Negative Logits
ÄijÃłn
-0.15
isible
-0.14
lav
-0.14
meille
-0.13
#__
-0.13
лава
-0.13
.AspNet
-0.13
meilleurs
-0.13
adan
-0.13
ä¸ĭ载次æķ°
-0.12
POSITIVE LOGITS
regular
0.36
medium
0.32
regular
0.32
Regular
0.32
normal
0.31
smaller
0.29
Regular
0.29
standard
0.28
non
0.27
medium
0.27
Activations Density 0.189%