INDEX
Explanations
references to contemporary cultural trends or happenings
New Auto-Interp
Negative Logits
Ä«
-0.22
ora
-0.22
oc
-0.21
×ķ×
-0.21
os
-0.20
ob
-0.20
Äĵ
-0.20
oin
-0.19
ond
-0.19
oup
-0.19
POSITIVE LOGITS
ìķĦ
0.27
Õ¡Õ
0.27
á
0.27
ά
0.26
ãĥ£
0.25
аÑĨи
0.24
аÑĤ
0.24
×IJ
0.24
ãĤ¡
0.24
á½±
0.23
Activations Density 0.040%