INDEX
Explanations
requests for additional information or learning opportunities
New Auto-Interp
Negative Logits
dn
-0.15
Fried
-0.15
hood
-0.15
ford
-0.15
waste
-0.14
ft
-0.14
less
-0.14
Stefan
-0.14
isko
-0.14
esh
-0.13
POSITIVE LOGITS
about
0.29
about
0.23
tentang
0.23
ABOUT
0.22
عÙĨÙĩ
0.22
_about
0.22
About
0.19
About
0.19
.about
0.18
-about
0.18
Activations Density 0.024%