INDEX
Explanations
phrases related to categorization or classification based on characteristics
phrasings that involve the concept of "sort of" in relation to various topics
New Auto-Interp
Negative Logits
ajor
-0.83
ĸļ
-0.79
ļéĨĴ
-0.69
hens
-0.67
Zup
-0.67
enes
-0.66
tec
-0.65
hend
-0.65
orest
-0.65
Cub
-0.64
POSITIVE LOGITS
thing
0.91
luck
0.79
stuff
0.71
fun
0.69
nerve
0.69
catentry
0.68
crap
0.68
nonsense
0.68
humility
0.66
things
0.66
Activations Density 0.040%