INDEX
Explanations
phrases that express personal opinions or stances on societal issues
New Auto-Interp
Negative Logits
ยà¸ĩ
-0.16
виÑĩай
-0.13
éal
-0.13
ruptions
-0.12
ìĥĿëĭĺ
-0.12
ìĿ´ìħĺ
-0.12
bstract
-0.11
ảy
-0.11
.jetbrains
-0.11
oot
-0.11
POSITIVE LOGITS
that
0.99
THAT
0.90
That
0.85
that
0.84
That
0.82
that
0.72
_that
0.71
éĤ£
0.71
éĤ£ä¸ª
0.65
thats
0.65
Activations Density 2.665%