INDEX
Explanations
comparative phrases that illustrate similarities or examples
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.15
]={↵-0.14
ãĥ¼ãĥĦ
-0.14
auce
-0.14
anim
-0.14
cken
-0.14
št
-0.14
/mol
-0.14
ulumi
-0.13
ısından
-0.13
POSITIVE LOGITS
ours
0.23
this
0.18
yours
0.18
these
0.16
anner
0.16
váºŃy
0.16
hers
0.15
ily
0.15
esto
0.15
ìĥģ
0.14
Activations Density 0.041%