INDEX
Explanations
unique or special characters and symbols within text
New Auto-Interp
Negative Logits
=https
-0.15
اÙĦÙĩÙĨد
-0.14
_PS
-0.14
macOS
-0.14
...
-0.14
-carousel
-0.14
anut
-0.13
Patriot
-0.13
.px
-0.13
impactful
-0.13
POSITIVE LOGITS
Coffee
0.34
Coffee
0.31
coffee
0.28
coffee
0.26
Coff
0.26
cigarette
0.25
cigarettes
0.24
espresso
0.24
J
0.23
igaret
0.23
Activations Density 0.004%