INDEX
Explanations
terms related to qualitative descriptions or judgments about people, situations, or items
New Auto-Interp
Negative Logits
ulp
-0.17
allon
-0.15
oden
-0.14
837
-0.14
ÑĢован
-0.13
Fucking
-0.13
atively
-0.13
оди
-0.13
443
-0.13
831
-0.13
POSITIVE LOGITS
ones
0.39
Ones
0.30
ones
0.27
stuff
0.26
Stuff
0.22
iest
0.21
portion
0.20
stuff
0.20
éĥ¨åĪĨ
0.20
liest
0.18
Activations Density 0.167%