A Development Cycle for Automated Self-Exploration of Robot Behaviors

In this paper we introduce Q-Rock, a development cycle for the automated self-exploration and qualification of robotic behaviors. With Q-Rock, we suggest a novel, integrative approach to automate robot development processes. Q-Rock combines several machine learning and reasoning techniques to deal with the increasing complexity in the design of robotic systems. The Q-Rock development cycle consists of three complementary processes: (1) automated exploration of capabilities that a given robotic hardware provides, (2) classification and semantic annotation of these capabilities to generate more complex behaviors, and (3) mapping between application requirements and available behaviors. These processes are based on a graph-based representation of a robot's structure, including hardware and software components. A graph-database serves as central, scalable knowledge base to enable collaboration with robot designers including mechanical and electrical engineers, software developers and machine learning experts. In this paper we formalize Q-Rock's integrative development cycle and highlight its benefits with a proof-of-concept implementation and a use case demonstration.


Introduction
Modern robotics has evolved into a collaborative endeavor, where various scientific and engineering disciplines are combined to create impressive synergies. Due to this increasingly interdisciplinary nature and the progress in sensor and actuator technologies, as well as computing hardware and AI methods, the capabilities and possible behaviors of robotic systems improved significantly in recent years. Along with the greatly enhanced potential to strengthen established application fields for robotics and unlock new ones, these developments pose several challenges for developers and users interacting with robotic systems. On the one hand, for hardware and software engineers, these technological improvements led to an increasing size of the design space and, hence, development, integration and programming complexity. Engineers do not only have to deal with technical peculiarities of a rich variety of different components when constructing a robot. They also have to develop advanced control strategies and integrate knowledge from a range of disciplines in order to unlock the full potential regarding a robot's capabilities.
On the other hand, the field of end users of applications for robotic systems widens, as more complex and versatile robots open up a wealth of novel applications for which robots were unsuited only years ago. These users will not be interested in the detailed construction of hardware or software, but will rather evaluate a robotic system by its possible behaviors and the tasks it can accomplish. However, without the domain knowledge of an experienced roboticist or AI researcher, designing a robot and the algorithms that provide the desired behaviors is next to impossible, and employing engineers for construction of a custom robot is likely to be prohibitively expensive.
We claim that both collaborative teams of roboticists and end users would greatly benefit from a unifying automated framework for robot development that spans several abstraction levels to interact on. To address this, we introduce Q-Rock [1] a development framework leveraging integrative AI to simplify and automate the whole process from robot design to behavior acquisition and its final deployment for experienced roboticists from different disciplines and naive users alike.
The core hypothesis underlying this project is that the set of all possible behaviors of a robotic system is inherently defined by its constituting hard-and software, and, furthermore, that this set can be found by self-exploration of the robotic system. Hence, a distinct feature of our approach is that the self-exploration of the hardware is as goal-agnostic as possible, such that novel behaviors can be synthesized from already explored capabilities of the system without having to reperform exploration with a novel task in mind. This reusability is made possible by clustering capabilities and describing resulting behaviors in a semantically annotated latent feature space.
Using a growing common knowledge base that links various description levels, from technical details of single components to behavior classification of self-explored robotic systems, we also provide a basis for behavior transfer between systems and reasoning about a possible robotic behavior given its composition. Hence, we propose a development cycle with multiple entry points that simplifies and speeds up the overall design process of robotic systems to benefit both developers and users.

Contributions
The main contribution of Q-Rock is the integration of several different subdisciplines of AI into a common framework in order to explore and qualify robotic capabilities and behaviors. To this end, we integrate stateof-the-art methods and develop new approaches in four key areas: (i) Assembly of mechanical, electronic and software components with well defined interfaces and constraints (ii) Exploration and clustering of capabilities aided by machine learning (iii) Ontology-based semantic annotation of behaviors with user feedback (iv) Reasoning about a possible behavior given a robot's hard-and software composition In this paper, we provide the theoretical concepts of our approach, but we will also elaborate on some implementation details, and demonstrate a use case scenario for developing a robotic manipulator with a reach behavior.
An important point to note is that Q-Rock relies on a well-defined robot hardware design process, i.e., a database that provides the information to couple hardware and software components automatically by specifying requirements and compatibilities. We see this part as a crucial pre-requisite to implement the Q-Rock concept, for which some foundations were developed in the precursor project D-Rock [2].
As already stated, Q-Rock combines different fields of research, where each area might have its own interpretation and definition for the same term. Where needed, in order to avoid confusion through conflicting connotations of a term, we decided to introduce a new one instead.

Paper Outline
In the following Section 2, we give a short overview of the current state and limitations of automatic robot behavior learning. Section 3 introduces the concept of Q-Rock, provides an overview of the methodology and formally defines the procedures and abstractions to implement Q-Rock. Section 4 describes exemplary use case scenarios to illustrate the implemented concepts presented in this paper. A discussion in Section 5 concludes the paper.

State of the Art
Multiple disciplines, i.e., knowledge reasoning, task planning as well as machine learning, can provide important methods to explore robotic capabilities or combine capabilities to generate more complex ones. However, to the best of our knowledge, there is little work in automated robot design that is approached in a holistic way as it is done in Q-Rock. We will highlight relevant holistic approaches here, whereas related work in subdisciplines of the Q-Rock development cycle are described in the corresponding paragraphs.
Ha et al. [3] suggest an automated method for the design of robotic devices using a database of robot components and a set of motion primitives. They use a high level motion specification in the form of end-effector waypoints in task space. Their system then takes this motion specification as input and generates the simplest robot design that can execute this user-specified motion. However, Ha et al. do not consider the inverse problem which is a hallmark of Q-Rock: finding all motions a device can perform.
A similar development, tackling the problem of learning motion behaviors via exploration is pursued in the project memmo (Memory of Motion) [4], where a graph in state space is generated during exploration, and where the links between nodes refer to control strategies adhering to the system dynamics. Both graph and control strategies are refined during exploration, and the resulting trajectories are then used during deployment to warm-start an optimal control framework. The key difference to our approach is that in the memmo framework, task objectives need to be known and encoded in a loss function for training, whereas our framework is mostly goal agnostic during exploration.
A system providing access to robotics development via a web based platform is included in the Amazon Web Services (AWS) [5]. The services include a Robot-Maker which basically enables use of ROS based tools via browser windows. This way the user doesn't have to install any tools locally. However as far as it could be investigated, even though an account could be created freely, most of the services are commercial. Additionally, even though the ROS community provides many solutions for different applications, a tool that provides an easy access to a non-expert user, as aimed at by the Q-Rock system, is lacking and is also not provided by AWS.
Another holistic approach for constructing and simulating robots is presented by the Neurorobotics Platform [6], under development within the Human Brain Project [7]. At the time of writing, this web based framework includes an experiment designer, robot construction for simple toy robots (Tinkerbots [8]), a range of predefined robots and brain models, and various plotting and visualization tools. The focus lies on fostering collaboration between neuroscientists and roboticists and providing simulated embodiment for biologically inspired brain models. In Q-Rock we rather focus on exploration of possible capabilities given a robot's composition, and linking these capabilities and corresponding behaviors to its properties.

Q-Rock Development Cycle
To explore and annotate the inherent capabilities and possible behaviors of a robot and subsequently allow for reasoning about relations between composition and behaviors, Q-Rock combines different kinds of AI techniques in a development cycle (see Fig. 1). This cycle can be driven by the high-level task specifications of a user, but is also flexible enough to support experienced domain experts. The cycle is divided into three major steps: (i) simulation-based exploration of the capabilities of a given piece of robot hardware, (ii) clustering and annotation of these capabilities to generate socalled cognitive cores, and (iii) model-based reasoning about the set of cognitive cores that are required for a specific task. A graph database (DB) provides the central knowledge base to connect all steps. The DB provides information about known hard-and software components and the structure of available robotic systems, exchanges data between workflow steps, and stores results. The development cycle can be initiated from two entry points (E1 and E2 illustrated in Figure 1). The first entry point E1 allows to enter a bottom-up development approach. Here, the goal is to identify the capabilities of a given robotic system or subsystem. E1 starts with hard-and software composition and ends up with all capabilities of that system, organized in semantically described cognitive cores.
The second entry point E2 represents a top-down approach. A user triggers the development cycle by providing a task definition, i.e., a given user scenario consisting of an environment and a specific problem that a robot, which is not known to the user, shall solve. The goal is to either find a robot in the database that is suitable to address the specified problem or to suggest a novel composition that will likely solve the task.
Complementary to the Q-Rock development cycle overview in Figure 1, we provide a standard Entity Relationship diagram in Figure 2 to illustrate involved entities and their relationships. The following sections motivate and outline the different steps of the Q-Rock development cycle, and successively introduce these entities and their definitions to formalize our approach.

Modelling Robot Composition
For all steps of the development cycle it is essential to have a well-defined model of a robotic system. In Q-Rock, we represent a robotic system, i.e., the specific types and compositions thereof -as well as relations between -robot hard-and software components, using a graph-based model.

Related Work
The formal Architecture Analysis and Design Language (AADL) is designed to describe both processing and communication resources, as well as software components and their dependencies. A system designer is supposed to thoroughly model the system design, such that given an application designed by the application designer, it can be deployed on the system. Furthermore, Fig. 1: The Q-Rock development cycle consists of three complementary steps: "Exploration", "Classification and Annotation", and "Reasoning". A graph database serves as a central knowledge representation and data exchange hub. The process may be initiated from two entry points (E1 and E2), depending on the intention of a user. The entry point specifies whether Q-Rock follows a bottom up (E1) or top down (E2) development approach.
it is possible to use special tools to perform design analysis prior to compilation and/or testing in order to find errors before deployment. A detailed overview of AADL can be found in [9].
TASTE is a framework developed by the European Space Agency to design, test and deploy safety-critical applications. Is uses AADL as the modelling layer to design systems and applications. Based on these models the framework builds the glue code and enables the deployment of the software to a variety of different processing and communication infrastructures. Details can be found in [10].
In contrast to the aforementioned approaches, the domain-specific language NPC4 developed by Scioni et al. [11] uses hypergraphs to model all aspects of structure in system design, software design and other domains. Its four main concepts are node (N), port (P), connector (C) and container (C) combined with the two relations contains (C) and connect (C); refinements of these concepts form domain-specific sublanguages. A detailed description of the concept and the language NPC4 is presented in [11].
Our approach aims at exploiting the flexibility and formalization of NPC4 for a structural reasoning approach and combine it with the well-known and tested concepts of TASTE/AADL. However, unlike NPC4 our approach is based on standard graphs to make use of state of the art database technology.

Components
Components represent the hard-and software building blocks of robotic systems, which can be combined to generate more complex components. Hence, a hierarchy of components of different complexity is created. At the lowest level of this hierarchy are atomic components, which can not be divided into other components in our model.
Components are grouped into a predefined, but extendable set of domains D = {S, P, M, E, A}. The domains are described in Table 1. Each domain can only form new components by combining other components of the same domain (unless they are atomic components). The only exception is the Assembly domain which allows the composition of components of different domains. Thus, the Assembly domain is the one in which complete robotic systems -including their mechanical, electrical, processing and software structurecan be represented.
The main entities and their relationships are represented as labelled vertices and edges in a graph G = (V, E, s, t, Σ, p v , p e ). Here, V is the vertex set, E the edge set with s, t identifying source and target vertices, Σ is a vocabulary, and K ⊂ Σ\{∅} is a set of predefined keys with label ∈ K. Property functions for vertices and edges are defined as p v : V → K × Σ    Table 3. Note that all entities are represented as vertices in the graph, so that all entity sets listed in Table 2 are subsets of the vertex set V , and likewise all relations are subsets of the edge set E. Relations have to be constrained to form a consistent system. The relations I C , I I , P, S are many-toone relations; that means, that no element of their domain can be mapped to more than one element in their co-domain. This constraint ensures, for instance, that parts of one component model cannot be parts of another component model. The relations H, H C are oneto-many relations, thus preventing ambiguity of interfaces between different entities. C o is a many-to-many relationship, allowing any connection between interfaces, whereas the A relation is a one-to-one relation between an external and an internal interface. Given the entities    Table 4 are defined on the graph. Algorithm 1 serves as example to illustrate the usage of these operators to construct the component model of a robotic leg. It is assumed, that a component model for a robotic joint J ∈ M C with two (external) mechanical interfaces a, b ∈ I exists. Furthermore, the existence of a component model for a robotic limb L ∈ M C with two (external) mechanical interfaces x, y ∈ I is assumed. Figure 3 visualizes the graph structure resulting from running Algorithm 1. The graph has three components g i , gears of different ratio, in the mechanical domain g i ∈ M . One of it has been instantiated (I C ) and is part of (P ) an actuator A ∈ A ∩ M C . Chaining the respective relations I −1 C • P (see Table 3) resolves to: The component (instance) actuator a with stator and gear as its composing parts is combined with controller electronics and controller software to define the joint model. This model defines the structure of joint instances in the higher-level leg component. Here, gears and rotor/stator components form an actuator. The actuator, the controlling electronics and software components form a joint which can be used to define a robotic leg.
Algorithm 1 Example application of graph operators to constructing a component model

Exploration
The exploration step in the Q-Rock development cycle aims at finding all capabilities a given piece of robotic hard-and software can provide. Capabilities in this context mean the possible trajectories a robot can produce within a robot state space and a world state space. The exploration is based on simulating a robot that has been modelled as formally described in Section 3.1 -a practical example is given in Section 4.

Related Work
In distinction to other approaches in robotics, we try to avoid directing the exploration towards any kind of goal, and instead aim at generating a maximal variety of capabilities to find a representative set that might include novel, unanticipated ones. Capability exploration methods usually aim to create a library of diverse capabilities [12,13,14,15], so that the coverage of the behavior space is maximized and the capabilities can be utilized in different tasks and environments. In order for the capabilities to be transferable between tasks, these approaches avoid task specific reward functions. Instead they use intrinsic motivations such as novelty, prediction error and empowerment. An extensive overview of intrinsic motivations in reinforcement learning can be found in [16].
Because of their inherent incentive to explore and find niches, evolutionary algorithms are natural candidates for behavior exploration. Lehman and Stanley [17] propose to use novelty as the sole objective of likewise-called novelty search. It was found to perform significantly better than goal-oriented objectives in deceptive maze worlds. Novelty search has already been applied to robotics to find multiple diverse high quality solutions for a single task [18,19]. Cully [20] suggests combinations of different methods, e.g., quality-diversity optimization methods and unsupervised methods, which allow to explore various capabilities of a system without any prior knowledge about their morphology and environment.

Formalization
The abstracted approach of the exploration is depicted in Figure 4. It serves as high level description of the process where the formalization will be given in the coming paragraphs. Exploration discovers a set of capabilities by applying a search strategy, where the challenge lies in handling a significantly large state space. We tackle the large state space using a parameter-only based encoding for capability functions: the encoding is compact and yet arbitrarily precise. Creating a capability function from a dedicated capability function model and applying it on the actual robot in an execution loop results in a capability of the system, where a capability is the executed trajectory in the world state space. This structure allows to validate the feasibility and cluster robot specific execution characteristics from capabilities on the basis of the input parameter space.

States
Definition (Joint State). The joint state q q q ∈ Q Q Q is a vector of all joint positions of the robot. For a robot with n joints: Q Q Q ⊂ Q 1 ×Q 2 ×· · ·×Q n . Q Q Q is a subset of the Cartesian product because not all combinations of joint positions may be allowed due to the robot structure.
Definition (Actuator State). The actuator state s s s a of the robot is a tuple of the configuration and the joint velocity, so that s s s a = (q q q,q q q) ∈ S S S a , where the actuator state space The robot actuator state completely describes the positions and velocities of all parts of the robot at a given time.
The complete (observable) state of the robot contains not only the actuator state s s s a but also the states s s s s = (s s,1 , · · · , s s,m ) ∈ S S S s of all m sensors and possibly internal states s s s i .
Definition (Robot State). The (full) robot state s s s rob is a combination of actuator, sensor and internal states. An internal robotic state s s s i = (s i,1 , · · · , s i,k ) for k internal properties and s s s i ∈ S S S i may encompass for example internal time, battery status or a map of the robot's surroundings. S S S a , S S S s and S S S i are the sets of all possible actuator, sensor and internal states, respectively. The full robot state reads: The robot state s s s rob does not contain the complete information about the actual physical or internal state of the robot. It does only contain information that is accessible by the robot itself, i.e., that can be captured. Information about sensorless unactuated joints for example is not part of the robot state.

Actions
To trigger changes of the robot state and thereby generate trajectories of robot states, it is necessary to define motor actions. A motor, which is part of a joint, outputs a motor torque τ ∈ T . The torques for all joints can be written as a tuple τ τ τ ∈ T T T = T 1 × T 2 . . . T n . An idle joint always outputs τ = 0.
Definition (Action). A kinematic action a a a kin is a tuple of a torque τ τ τ ∈ T T T and a time interval ∆t, such that a a a kin = (τ τ τ , ∆t) ∈ A A A kin , where A A A kin denotes the kinematic action space. Applying a kinematic action to the robot maps the current robot state to a new robot state: a a a kin : S S S rob → S S S rob Besides kinematic actions, there are also perceptive actions a a a per ∈ A A A per which evaluate sensor data and store abstractions in the internal robot state: a a a per : S S S s → S S S i Finally, there are internal actions a a a int ∈ A A A int processing internal information: The full action space is the Cartesian product of the individual action spaces:

Environments & World State
Note that environments with different properties, including, but not limited to, gravitational force, pressure, and temperature will have an influence on the outcome of a kinematic action. Hence, environmental parameters as well as poses and properties of objects in the robot's workspace have to be considered when evaluating the feasibility of kinematic actions. Furthermore, the environment is an important component to identify certain properties of capabilities. For example a throwing capability relies on the temporal evolution of states of the object to be thrown, which is represented by environment states that can be external to the robot (if it does not have the appropriate sensing capabilities). This arises also for capabilities that can, at first glance, be considered mostly environment independent: The effect of actions on the trajectory of an end effector when pointing is still determined by gravity and the viscosity of the medium in which the movement is performed. Even more, there is no generic way to determine the poses of all the robot's limbs just from sensing the actuator states s s s a : if a system is underactuated or does not have sensors on some actuators, an analytical solution for the feed forward kinematics may not exist. To compensate for this, we introduce the world state space which may also contain information unavailable to the robot itself: Definition (World State Space). The world state s s s world ∈ S S S world = S S S rob × S S S obs , where the observational state space S S S obs contains states read from the environment, e.g., the position and orientation of objects or robot limbs and end effectors. These states are obtained during simulation or by monitoring a real world execution and will be accessible to the robot if it has the appropriate sensing capabilities.

Capability
A particular capability will require the sequential execution of a sequence of actions. Such an action sequence can be represented by a capability function. The capability function selects an action for the robot based on the current state and time, and thus defines how the robot is supposed to (re-)act in a given situation.
Definition (Capability Function). A capability function is a function cap that maps the robot state at a given time to an action: and cap ∈ CF CF CF , where CF CF CF denotes the capability function space. An important detail to note is that the capability function operates on the robot state space and not the world state space. A capability function is a robot inherent function that considers only information that is available to the robot itself.
A capability function can be created in various ways, e.g., it could be a policy obtained from reinforcement learning, a behavior from an evolutionary algorithm, or a control law from optimal control theory.
In general the generation of capability functions can be formulated with a capability function model: pability function model is a mapping cfm from a parameter space Θ Θ Θ to the capability function space The capability function model introduces a parameter space Θ Θ Θ, which allows the parametric generation of capability functions and is the basis for the exploration. In order to not constrain the exploration, the capability function model should be able to represent all kinds of capability functions of a system. In principle, however, it is also possible to operate with multiple capability function models at once.
By repeatedly calling a capability function and applying the resulting actions to the robot, a capability is executed.
Definition (Capability ). A capability l l l T ∈ L L L, where T is the finite time horizon and L L L the space of all trajectories, is defined by a sequence of world states and time coordinates: where the transition between successive states s s s t and s s s t+1 is effected by an action a a a t of the robot.
Capabilities are central entities in the Q-Rock philosophy, since we argue that a complete set of all possible capabilities is the most fundamental representation of what a system is able to do.
To refine the notion of completeness, we make two crucial assumptions at this point: We assume a discretized time model by arguing that (most) robots are controlled by digital hardware or controllers which have a specific clock or controller frequency. The smallest time step considered here is the denoted by δt.
Assumption (Discretized State Space). We assume that S S S is sensed by digital sensors and we consider only state changes if we can distinguish them. As consequence we have a discretized State Space S S S.
While this discretization reduces the cardinality of L L L, it is still countably infinite if t is not bounded, so we have to choose a maximal capability length T. Now, in principle, a complete set of all possible capabilities up to a maximal length can be generated. Not surprisingly, this set would still have an intractable size considering typical resolutions of modern hardware and degrees of freedom [21].
As a full set of capabilities is not tractable, the next best thing is a representative set of capabilities with a uniform distribution in a given feature space. An evolutionary algorithm such as novelty search [17] offers a suitable approach. With novelty search it is possible to search for novel capabilities with respect to a previously specified characteristic. A representative set of capabilities, obtained with an exploration strategy like this, may serve as a starting point for exploring the space in a finer resolution, for capturing the system dynamics in a model, or for searching for a specific capability.
Because the capability function model itself is robot agnostic, it is a priori not clear, which parameters θ θ θ correspond to feasible capabilities of the robot. For this reason, a validation model is trained that predicts which parameters θ θ θ lead to capabilities that are actuable on the robot in the current environment.
After the exploration phase is finished, the obtained library of capabilities, the capability function model with the simulator to generate new capabilities, and the validation model are saved to the database and can be used by the following step.

Classification and Annotation
The goal of this workflow step is the creation of cognitive cores. Cognitive cores are hubs that connect a specific behavior model with the robot's hard-and soft-ware, semantic annotations of that behavior model, and robot specific capabilities that execute the behavior. Cognitive cores allow the execution of the corresponding behavior by using constraints and target values in semantically annotated feature spaces, and rely on clustering of capabilities in these spaces. Cognitive cores are central entities in Q-Rock since they constitute our solution to the symbol grounding problem, i.e., link semantic descriptions to subsymbolic representations, and serve as a basis for reasoning about the relation of hard-and software components, robotic structure, and resulting behavior.

Related Work
One important point of our approach is the clustering of capabilities into feature spaces and the control of the robot within these feature spaces. Several studies have shown performance and robustness benefits from controlling a simulated agent in a latent, compressed feature space. Ha et.al. [22] used a variational autoencoder on visual input to control a car in a 2D game world. In the context of hierarchical reinforcement learning, Haarnoja et. al. [23] showed that control policies on latent features outperform state of the art reinforcement learning algorithms on a range of benchmarks, and Florensa et. al. [24] found high reusability of simple policies spanning a latent space for complex tasks. In a similar vein, Lynch et al. [25] investigate using a database of play motions, i.e. teleoperation data of humans interacting in a simulated environment from intrinsic motivation, combined with projections into a latent planning space to generate versatile control strategies. While the latter study has parallels to our segmentation into an exploration and a clustering phase, no previous approach aims at a semantically accessible feature representation, as we propose in Q-Rock.
A method for generation of a disentagled latent feature space from observations was developed by Higgins et al. [26]. This variational autoencoder builds on the classical autoencoder architecture [27], [28] that compresses data into a latent space. The authors note that disentagling seems to produce features that are also meaningful in a semantic sense, such that changes in feature space lead to interpretable changes in state space. A combination of unsupervised clustering and variational autoencoders is described by Dilokthanakul et al. [29]. However, direct semantic annotation of these features and a formalized combination into behavior models has not been considered to date.
A practical aspect of our formalization is the execution of clustered capabilities, for which we employ training of generative models based on normalizing flows [30], [31], which has shown advantages over classical methods such as Gaussian mixture models.

Formalization
To arrive at a formal definition of cognitive cores, we first need to clarify what constitutes a behavior. Since the term "behavior" has overloaded definitions in various disciplines, we specifically mean behavior in a broad, radical behaviorist sense, while emphasizing the phenomenological aspects: Everything an agent does is a behavior, and all behaviors must be in principle completely observable [32]. The complete observation is provided by the capabilities as defined in Section 3.2. We further define a behavior model as an abstraction of similar capabilities that have the same semantic meaning: A behavior of "walking" is not bound to the exact execution of a sequence of robot and world states, but rather a large number of capabilities that can differ in certain aspects would intuitively be labelled as "walking". We thus propose that different behavior models can be identified by finding constraints to capabilities in appropriate feature spaces, leading to the following definition: Definition (Behaviour Model ). A robot agnostic, semantically labelled abstraction of a set of capabilities L that adhere to constraints in feature spaces.
Feature spaces arise from transformations of the capabilities via a feature function to capture specific aspects, and allow to define distances between capabilities within these aspects: Definition (Feature Function). A feature function ff k maps from capabilities l ∈ L to a set of values in R n k , so that ff k : L L L → R n k . This function is supplemented with a semantic description.
. It is uniquely defined by the combination of ff k and its metric m k .
An important aspect of the feature functions is their semantic descriptions, which constitute the language in which the behavior models are be defined.
The feature functions ff k can be obtained in two different ways. Either they are defined manually and directly annotated by a semantic description, e.g., with the description 'average actuator velocity'. Alternatively, they can be found automatically in a purely data-driven way, e.g. by using variational autoencoders [26] adapted to trajectory data. The Q-Rock framework allows both approaches that can also be used in parallel to provide maximal flexibility, whereas the latter is not implemented to date. We thus enable the use of expert knowledge to define the most relevant feature spaces for a given problem. However, a non-expert user could also solely resort to the automatic approach. In addition, interesting feature functions that reflect the specifications of the robot model might be discovered automatically that are not obvious -even to an experienced observer -, or hard to formulate.
An example of a behavior model defined by constraints in feature spaces is: label : reach constraints : F F F 1 : min : 0.95 max : 1.0 Currently, only min/max and variable constraints are implemented. A variable constraint means that a target value f f f tar k has to be provided when the associated behavior should be executed by a robot. Since behavior models are robot agnostic, they can be grounded for different robotic systems. The robot agnostic nature of the behavior model depends on feature functions' semantic descriptions: feature functions having the same effect on a semantic level may have varying definitions for different robots, especially if they are represented by encoder networks or other function approximators. Thus it must be possible to identify feature functions across robots by their semantic description.
To achieve a robot specific grounding of the behavior model, the feature spaces F F F 0 , . . . , F F F k are populated by mapping robot specific capabilities provided by the exploration step via the associated feature functions ff 1 , . . . , ff k .
In principle, if all capabilities of a robot are contained in the representative set provided by the exploration step, a simple lookup of capabilities that adhere to the behavior model constraints is sufficient to execute the desired behavior. However, as noted before, this usually implies a capability set of intractable size.
We tackle this problem in two ways: Firstly, the capabilities are clustered in F F F k ∀k and the centroids of the clusters are used to check constraints for all members of the cluster. Whereas the result of this check is not exact for all capabilities, computational performance is greatly increased. Secondly, to avoid a lookup search when executing a behavior and to not be restricted to capabilities seen during exploration, we abstract generative models on the parameter sets θ θ θ from the capability clusters. Thus, clusters are represented by probabilistic generative models that, when sampled from, provide parameters θ θ θ which, via recurrent execution of the capability function model, lead to capabilities that likely lie in the intended clusters. Clusters are thus defined as: Definition (Cluster ). A cluster with label c j k is defined within a feature space F F F k , which is associated with several clusters j ∈ [1, n k ], where n k denotes the number of clusters found in F k . Each cluster has a generative model G j k (θ θ θ) ≈ p(θ θ θ, c j k ) = p(θ θ θ|c j k )p(c j k ), that represents a probability distribution over parameter space θ θ θ, and a centroidf f f j k = ff k (l(arg max θ G j k (θ θ θ))), where we use l(θ θ θ) as a shorthand for the combination of capability function model and recursive application of the execution loop (see Figure 4).
Using generative models has the advantage that models from different clusters can be combined and jointly optimized to find a parameter set θ θ θ that generates a capability lying in several intended clusters. The clustering procedure is visualized in Figure 5.
After clustering, robot specific cognitive cores can be instantiated. Cognitive cores are defined as: Definition (Cognitive Core). A cognitive core is an executable grounding of a behavior model for a specific robotic system, where constraints of the behavior model are checked against cluster centroids f f f j k . Clusters that satisfy these constraints are linked to the cognitive core. A cognitive core can only be generated when all behavior model constraints can be met. These cognitive cores are described by a semantic annotation: Definition (Semantic Annotation). A tuple SA = (L, X ), where L is a set of labels, |L| ≥ 1 and X is the set of constraints.
By default, the cognitive core inherits the labels and constraints from its behavior model, but the semantic annotation can be augmented by robot specific information. This semantic annotation is the main interface between the generation of cognitive cores and the reasoning processes described in Section 3.4. The relation between feature spaces, clusters, behavior models, cognitive cores and semantic annotations is illustrated in Figure 6.
In this framework, the execution of a behavior on a specific robot, i.e., the execution of a cognitive core, comes down to finding a parameter set θ θ θ max that jointly maximizes all generative cluster models adhering to the constraints of the behavior model. If a behavior model includes variable constraints, each target value f f f tar k in the corresponding feature space F F F k needs to be as-  Transformation functions ff k are applied to map to feature spaces F F F k . In these feature spaces, clustering is performed. The labelled clusters are used to train probabilistic generative models on the parameter space Θ Θ Θ, s.t. clusters can be stored in an efficient and expressive way. When sampling from the generative cluster models, parameters θ θ θ are generated that lead to capabilities in the intended cluster. The mapping from parameters to a capability is mediated by the capability function model and the execution loop (see Figure 4). During training, sampling of parameters and generation of new capabilities is used to verify model performance.
signed. The cognitive core then finds the cluster models with closest centroids to the variable inputs. The cognitive core effectively uses a constraint checking function cc(c j k ) to determine the relevant clusters, where if type is "min/max" and min <f j k < max 1 if type is "variable" and with function m k as the metric of the feature space F F F k . Note that this implies that several cluster models in the same feature space can fulfill a min/max constraint. The product of all currently relevant cluster models, i.e. the models G j k for which cc(c j k ) = 1, results in a new probability distribution. The maximum of this distribution corresponds to a parameter set θ θ θ max that has the highest likelihood of generating a capability that Behavior Model Fig. 6: Relation between features, clusters, behavior models, cognitive cores and semantic annotations. The behavior model is defined by constraints in feature spaces. This behavior can be grounded as a cognitive core for a specific system when clusters for this system exist that fulfill these constraints. Constraint fulfilling clusters are linked to the cognitive core. The cognitive core inherits the generic label of the behavior model, but can have more that describe specifics for this robot. The constituents of the semantic annotation are colored orange.
lies within all relevant clusters when used as input to the capability function model and the execution loop (see Figure 4). The maximization step is then formally written as: where (k, j) ∈ M M M if cc(c j k ) = 1. Since this approach is based on probabilistic modeling, it is possible that the capability associated with θ θ θ max violates a constraint. However, assuming a smooth mapping Θ Θ Θ → F F F k via the feature functions ff k , the violation is likely mild. If not violating a particular constraint is important, for example to avoid collisions, different weights can be assigned to different constraints, which control the relative influence of the corresponding cluster models. Note that it is also possible that cluster models are combined that have close to or completely disjunct distributions. Thus, in practice a probability boundary has to be set under which the maximization result θ θ θ max is rejected and it is assumed that no capability exists that fulfills all constraints.
One important challenge of the approach is how behaviors are cast into the constraint-based, phenomenological behavior model we use. Since we aim at semantics which is intuitively understandable, we rely on human interaction. Thus, the first option Q-Rock provides is hand-crafting behavior models. Although it requires some domain knowledge, this approach scales well in the sense that once defined, the behavior model can be grounded for many different robotic systems. In addition, we also envision semi automated approaches: (1) Behavior modelling from observation of human examples, and (2) Modelling human evaluation functions with respect to a specific behavior. Approach (1) is based on research on end effector velocity characteristics for deliberate human movement [33,34]. These movement characteristics can be formulated as feature space constraints and thus used to define behavior models. For approach (2), it was shown that implicit biosignals of the human brain and explicit evaluation of a human observing simulated robot behavior can be used to effectively train a model of the underlying evaluation function [35], and to guide a robotic learning agent [36]. Also here, feature space constraints can be derived from the trained evaluation function approximator and used to define the behavior model.
At this point, we want to stress again that human interaction is absolutely necessary in the Q-Rock philosophy to define meaningful behavior. Throughout this workflow step, human labelling is required for feature spaces, cognitive cores and behavior models. The robot itself, after exploration, has no notion of causality, i.e., reaction to the environment, or purpose in what it is doing. Thus it is not behaving in the actual sense. Only through human semantic descriptions, i.e. what it would look like if the robot would behave in a certain way, are the capabilities of the robot in the environment ascribed to a meaningful behavior. Once the Q-Rock database grows, we will explore automatically generated labelling of feature spaces based on similarity to already labelled ones, which could speed up the labelling process by providing reasonable first guesses.
To summarize, cognitive cores derived from behavior models are central entities in the Q-Rock workflow, since they cast explored capabilities in a semantically meaningful form and provide a way to generate new capabilities that adhere to characteristics found by clustering. In addition, their semantic annotation provides the basis for reasoning about the connection of possible robot behavior to the underlying hard-and software, which will be elaborated in the following section.

Reasoning
Structural reasoning serves two purposes in Q-Rock: (1) to suggest suitable hardware to solve a user-defined problem, and (2) to map an assembly of hardware and software to its function. The former does not involve any type of active usage of the hardware and software assembly, but exploits knowledge about the physical structure, interface types and known limitations / constraints when combining components, as well as their relation to labelled cognitive cores.
Essentially, structural reasoning establishes a bi-directional mapping between assemblies of hardware and software components and its function. Note, that we explicitly do not use the term robot here, since the result of the mapping from capabilities might not be a single robot, but a list of hardware and software components.

Related Work
Knowledge Representation and Reasoning (KR&R) is considered a mature field of research, but there is still a gap between available encyclopedic knowledge and robotics. KnowRob [37], as knowledge processing framework, intends to close this gap and provides robots with the required information to perform their tasks. It builds on top of knowledge representation research, making the necessary adaptations to fit the robotics domain where typically much more detailed action models are needed. The core idea behind KnowRob is to automatically adjust the execution of a robotic system to a particular situation from generic meta action models. The platform is validated with real robots acting in a kitchen environment with a strong focus on manipulation and perception. Beetz et al. combine KnowRob with the usage of CRAM [38], which serves as a flexible description language for manipulation activities. CRAM in turn is used with the Semantic Description Language (SDL) which links capabilities with abstract hardware and software requirements through an ontological model. As a result, symbolic expressions in CRAM can be grounded depending on the available hardware. CRAM is, however, not a planning system that can be used to solve arbitrary problems. Instead it can formulate a plan template for an already solved planning problem.
Meanwhile, reasoning in Q-Rock aims at using planning, in particular Hierarchical Task Networks (HTNs), to generically formalize a problem in the robotic domain and to generate an action recipe as solution. HTN planning is an established technology with a number of available planners such as CHIMP [39], PANDA [40] or SHOP2 [41], but there is still no de-facto standard language comparable to the Plan Domain Definition Language (PDDL) [42] in the classical planning domain. Höller et al. [43] suggest an extension to PDDL hierarchical planning problems named Hierarchical Domain Definition Language (HDDL) to address this issue. Nevertheless formulating an integrated planning problem which includes semantic information remains an open challenge.

Top-Down: Identification of capable systems
We start by describing the process of structural reasoning from entry point E2 into the Q-Rock development cycle (see Figure 1). The workflow for the top-down reasoning is depicted in Figure 7.
To enter the cycle at E2 a user has to provide a description of an application problem to solve, i.e., defining tasks that should performed and the application environment, The problem is described with a general language and is firstly hardware agnostic. This means, neither does the application description explicitly state the use of a particular robot nor a robot type. While an input using natural language would be desirable for users to describe their application, Q-Rock uses a planning language like PDDL or as directly machine readable format. Formulating the application problem firstly generically and secondly as hierarchical planning problem allows the decomposition into a sequence of atomic / primitive tasks, where p ∈ P denotes a primitive planning task and P denotes the set of all primitive tasks. Q-Rock extends state-of-the-art planning approaches by (a) introducing a semantic annotation for each primitive task, and (b) representing the domain description, i.e., all tasks and decomposition methods, with an ontology.
The semantic annotation of a primitive tasks comprises a constrained-based description of what a task does in the classical sense of planning effects, i.e., what it requires to start the execution as preconditions and the condition that have to prevail during an execution. All conditions including pre/prevail and post can be tested upon using a predefined set of predicate symbols, which describe the partial world state including environment state s s s obs and robot state s s s. Hence, the semantic annotation of a primitive task also includes pre and prevail conditions that link to the state of hardware and software components.
Definition (Semantically Annotated Primitive Task ). A semantically annotated primitive task p + ∈ P + is a tuple of a primitive planning task p and a semantic annotation SA, so that p + = (p, SA). P + denotes the set of all semantically annotated primitive tasks.
The top-down reasoning is based on a predefined planning vocabulary V p = (P, C, d d d, sa) to specify problems, here representing a particular planning domain description, where the vocabulary consists of primitive (P ) and compound tasks (C), decomposition methods d d d for compound tasks, and a mapping function sa : P → SA, SA denoting the set of all semantic annotations. The top-down reasoning process is initially limited with respect to the expressiveness of this application specification language.
Transforming the user's problem into a planning problem and solving it results in a collection of plans, where each plan in this collection represents a robot type agnostic solution. This does not imply, however, that the requested task is solvable with current available hardware. Each semantically annotated primitive task that is part of a solution has requirements for its execution including, but not limited to environmental, temporal, and hardware and software constraints. Therefore, an additional validation of these constraints has to be performed.
Requirements to execute a plan can be extracted from the semantic annotations belonging to all of its semantically annotated primitive tasks, in the simplest case by the use of labels which are organized in an ontology. Semantic annotations also describe cognitive cores as explained in Section 3.3. Such a description might be incomplete in the sense that it does not catch every detail of the behavior of a cognitive core, but it serves to outline the semantics in an abstract and also machine processable way. Furthermore, it allows to match semantic annotations of tasks against semantic annotations of known cognitive cores. Thereby identifying cognitive cores that can be used to tackle the stated problem (see Figure 8). Each cognitive core maps to a single robotic system, but primitive tasks can map to different cognitive cores. Finally, a solution is only valid if a single suitable system which is capable to perform all tasks can be identified. While this concept of matching tasks and cognitive cores can also be used to map to multiple systems that cooperate to solve the stated problem, Q-Rock focuses on single robots for now.
As outlined before, Q-Rock aims at a planning approach which does not focus on a particular robotic systems, but provides abstract solutions. Although not specific robot type are considered, solutions still can comprise hardware requirements to solve a particular  Fig. 7: Outline of the top-down reasoning, which firstly involves the definition of a (planning) problem and the subsequent generation of a generic solution. Secondly a capable robots are identified to provide a robot specific plan, or alternatively only suggest components that might be relevant to provide a capable robot.
task. For instance, a requirement could be the emptiness of a gripper before starting a gripping activity. This particular precondition, however, implies also the availability of a gripper and thus restricts the applicable robot types that can be used to perform for this task to those that have a gripper. A selected target object might induce additional constraints for lifting mass or handling soft objects, so that only a particular type of gripper can be used.
Effectively, the following structural requirements exist for hardware and software components: 1. existence of hardware and software components in the system, and 2. particular (sub)structures that hardware and/or software components form. Additionally, functional requirements exist which might imply structural requirements, so that functional requirements can be considered as higher-order predicates for tasks. These could be implemented similar to using semantic attachments for planning actions as suggested by [44]. Workspace dimensions and required maximum reach are examples of an extended task description, which limits the range of systems applicable for this task.
To create semantic annotations, Q-Rock uses a corresponding language L. Meanwhile, Q-Rock uses an ontology to represent the vocabulary V ⊃ V p ∪ V SA of this language, which combines the planning vocabulary V p and the semantic annotation vocabulary V SA which permits to specify components, labels (corresponding to behavior types) and constraints. While labels allow to classify and categorize behaviors, constraints allow to detail or rather narrow these behaviors further, e.g., manipulation with a constraint to manipulate a minimum of 100 kg mass poses a significant hardware constraint.
semantic annotations characterize primitive tasks as well as behavior models and cognitive cores and are therefore essential to link Q-Rock's clustering step with top-down reasoning.

Bottom-Up: From Structure to Function
While the top-down reasoning process tries to find suitable hardware for a given task, the bottom-up approach aims at finding the functionality or rather tasks that a component composition can perform. The bottom-up identification of a robot's function is based on a formalization introduced by Roehr [45] who establishes a so-called organization model to map between a composite system's functionality and the structure of components. Functionalities can be decomposed into their requirements on structural system elements. As a result, the known structural requirements for a functionality can be matched against (partial) structures of a composite system to test whether this functionality is Fig. 8: Match semantic annotations in order to map from a task to a cognitive core that can perform this task supported. Figure 9 depicts the bottom-up reasoning workflow, where an essential element of the bottomup reasoning is the identification of feasible composite systems or rather assemblies. The combination of components requires knowledge about the interfaces of the components and permits a connection between any two components only if their interfaces are compatible. Multiple interfaces might be available for connection and physical as well as virtual (software) interfaces can be considered. Roehr [45] limits the bottom-up reasoning to a graph-theoretic approach while excluding restrictions arising from the actual physical properties of the overall component, e.g., shape or mass. Q-Rock will remove that restriction and analyse the actual physical combinations of components as part of the so-called puzzler component. Currently, the puzzler also operates only on a logical graph-theoretic analysis, but in the future will rely on representation of physical structures, e.g., the Universal Robot Description Format (URDF) for assembling new composite components from known atomic components. The puzzler outputs a set of feasible assemblies, which can subsequently be semantically annotated by (a) matching the structure to existing behavior models, (b) reasoning on the ontological description, and (c) using Hybrid Robot Dynamics (Hy-RoDyn) [46,47] for analysing the kinematic structure. An initial manual and later automated analysis of component structures in existing robotic systems could be used to identify generic design patterns in robotics systems. This can be used as heuristic to boost the bottomup reasoning process. The bottom-up reasoning process is triggered from new addition of software and hardware to the database. Thereby, it helps to augment the Fig. 9: The bottom-up reasoning is based on a combinatorial, constraint-based and heuristic search approach in order to identify feasible assemblies, which will be subsequently characterized in an automated way. Q-Rock database by adding new cognitive cores and increasing the options for solving user problems.

Use Case Scenario
To demonstrate the Q-Rock system, we run through an exemplary workflow starting at entry point E1 (see Figure 1), namely constructing a robot model, exploring its movement capabilities, clustering these capabilities and generating a cognitive core. Whereas in this example we focus on movement capabilities, the framework is designed to handle all kinds of capabilities that can be described as trajectories in the combined robot and world state. We use a simple 3-DOF robotic arm and create a cognitive core reach, which moves the end effector of the robotic arm from a given start configuration to a chosen target position in task space. Note, that we simplify this workflow by predefining feature spaces which are used in clustering and for defining a behavior model.
The schematic workflow is visualized in Figure 10, while Figure 11 highlights steps of this workflow which are triggered through our implemented website. Several preparatory steps are, however, assumed and required for the workflow to run including the definition of (a) feature spaces and (b) behavior models.
We validate the creation of the cognitive core and integration of the system, by retrieving the cognitive core again via its labels from the database.

Defining Feature Spaces
To characterize a reach movement the feature spaces firstly permit to extract the starting state and the end state of trajectories. Further qualities of a trajectory such as its directness are also included to penalize deviations of a trajectory from the direct route. As feature spaces we use: 1. F F F start with label 'start state' and the transformation function f start : L → S which maps a given trajectory l T ∈ L to its start state s s s 0 ∈ S, 2. F F F end with label 'end effector end state' and the transformation function f end : E → S, which maps a given end effector state trajectory to its final state e e e T , 3. F F F dir with label 'end effector directness' and the transformation function f dir : L → R and Note that the features F F F end and F F F dir use the end effector position, which we assume is part of the world state S S S obs , whereas F F F start operates on the internal state S S S rob of the robot.

Defining Behavior Models
A behavior model provides the high level abstraction for a behavior, thereby collecting the essential characteristics. In the case of the reach behavior the key characteristic is the directness, which we expect to be high so that it is bound to a minimum and maximum degree. Meanwhile the start and the end state are variable, since the reach behavior needs to be applicable in a range of situations with different target poses. Hence, start and end state can be viewed as general input pa-rameters to the behavior model. label : reach constraints : F F F dir : min : 0.8 max : 1.0 All defined feature spaces, the behavior model and the semantic annotation of the behavior model, which contains labels and constraints, are stored in the database to be accessible for all development steps.
Note that defining the behavior model is an essential, but currently also a limiting requirement, since the exploration of behaviors can only cover these predefined models.

System Assembly
To start the Q-Rock development cycle at entry point E1 a robotic system is required. Predefined robots or rather existing assemblies can be used to start the exploration. One of the major motivations of the Q-Rock development cycle is, however, the capability to explore any kind of hardware designs / assemblies and thereby support an open robot design process.
The so-called Robot Configurator workflow permits a user to create a robotic system in a simplified way, by combining a set of components that are defined in the database. Figure 11 illustrates the steps. Firstly, a user selects the desired items which are needed to built the robot and puts them in a shopping basket (see Figure 11a). For the reach example an assembly is built from the following items: 1. pan tilt unit, 2. lower pole, 3. joint motor, 4. upper pole, and an 5. end effector . After the selection has been completed, a CAD editor is started with the selected items being already loaded (see Figure 11b). We use the open source editor blender in combination with the extended functionality of the Phobos plugin [48] and another plugin to interface with the database. The user can build the desired system by selecting interfaces in the GUI and request to connect components through theses interfaces, which is only possible if the selected interfaces are compatible. The overall procedure requires only very limited editing competencies of a user, thus significantly lowering the entry barrier for physically designing a robot. Once the final system has been assembled, the user can save the new design to the database (see Figure 11c).

Exploration
For the exploration of movement capabilities we chose a capability function model where the parameters θ θ θ correspond to power series parameters. The parameters define an intended joint trajectory for all joints of the robot. The intended joint trajectories are passed to a controller which will return the robot specific actions that are necessary to follow the desired trajectory as closely as possible. The controller, where the desired states have been specified by a trajectory, takes the role of the capability function cap in this context. The actions applied to the real system will finally result in a capability. This capability function model ensures that the mapping from Θ Θ Θ to the capabilities is locally smooth, i.e. small changes in the parameters will lead to small changes in the corresponding trajectory. This is an important property for modelling clusters of trajectories in the Classification and Annotation process (Section 3.3).
To train the validation model, as described in Section 3.2, 10 5 trajectories were generated. Each trajectory has a length of 4 s. Five parameters specify the motion for every degree of freedom, i.e., each parameter lies within [−π, π], resulting in 15 parameters for the whole 3-DOF robot arm. The validation model has been implemented as fully connected neural network with four hidden layers and 22,102 parameters in total. Training was performed for 20 training steps with a batch size of 100. After training, the validation model had an accuracy of 96.6 ± 0.4 % on unseen data for predicting whether a parameter set corresponds to a valid motion.

Clustering
For clustering, the same capability set as for training the validation model was used. Without loss of generality, we implemented standard k-means clustering as the clustering strategy [49], choosing 27 clusters in F F F start and F F F end and 5 clusters in F F F dir . As generative models, we implemented neural ordinary differential equation (ODE) based normalizing flows [31], using a batch size 32, 1000 training iterations, and a network layout of two fully connected hidden layers with 64 neurons each. After 1000 training iteration, the model accuracies are 91 ± 2.1% for F F F start , 87 ± 3.3% for F F F dir , and 59 ± 4.1% for F F F end . Errors were calculated from 20 training runs per model with randomized initial weights. Cluster entities for the robot in this example are generated and stored in the database for the identification of cognitive cores.  The Robot Configurator workflow takes advantage of exploration and clustering and allows to construct a robot first, which will then automatically explored and annotated.

Cognitive Core Creation
Based on the defined behavior models, the cognitive core is instantiated and inherits the label reach (see also Figure 6). One cluster in the constrained feature space F F F dir was found with a centroid value of ≈ 0.9. Since the cluster fulfils the behavior model's minimum and maximum value constraints for this feature space, the cluster is linked to the cognitive core. In the current implementation, particle swarm optimization algorithms are used for the global optimization of the three feature models and all constraints and feature inputs are weighted equally.

Cognitive Core Annotation
For the annotation step, the cognitive core is executed several times with different variable inputs to F F F start and F F F end . Videos of the performance of the cognitive cores are generated and shown to a user, who can confirm the selected labels for the cognitive core or (re)assign labels. For this demonstration the user approves and sticks to the label reach for the cognitive core, which has been inherited from its behavior models.

Solving a Task
cognitive cores are semantically annotated in order to provide a high-level description or rather specification of their performance. A user is not necessarily interested in designing new systems, but will typically first search for available robots which can solve the task at hand. For this use case, we designed the workflow named Solve My Task, where a user selects a combination of labels from the existing ontology and matches them against existing cognitive cores in the database. Before the user has to make a final choice, the performance of each identified cognitive core can be inspected through the previously rendered videos. Here, the explored reach cognitive core and can successfully be retrieved and visualized to the user.

Solving a Mission
The final use case deals with solving a user's application scenario, in the following referred to as mission. A mission can range from a single robot action to a complex action plan involving multiple actions that need to be sequenced. A mission with sequential actions can be composed through our web interface, based on a set of predefined -yet generic -actions: grasp, navigate, perceive, pick, reach, release. For this evaluation we select the reach action, which maps to a requirement for  cognitive cores with a semantic annotation including reach labels, so that as an intermediate result the previously identified core can be picked. The cognitive core is linked to the design of the 'NewShoppingCart', which can now be considered a suitable robot system to perform the mission. Therefore, this custom design is the final suggestion of the Q-Rock development cycle to solving this mission.

Discussion
The fundamental idea behind the Q-Rock approach is to integrate and extend existing methods in AI, both on the symbolic and the sub-symbolic level, and to implement a framework that assists users to solve their intended task with an existing or novel robot. To achieve this, a central challenge is a unifying concept and theoretical framework to (a) integrate all components in order to realize the Q-Rock cycle (given in Figure 1  In this paper, we made an essential step by introducing the conceptual framework as a basis for all subsequent work. With the use case rendered, we can already demonstrate that the functional coupling of all the steps in Q-Rock is working. In particular, the example shows that a model of annotated hardware can be used to successfully generate simple robotic capabilities (starting at E1 in the cycle), which can be successfully clustered and annotated to generate a cognitive core. Hereby, a link is established from an exploration on the sub-symbolic level to a representation containing a semantic label, such that semantic input from a user can be made on which reasoning is performed. Q-Rock is unique in this way goal-agnostic capabilities are cast into a broader semantic framework, and sub-symbolic and symbolic levels in AI are integrated.
In the following, we will discuss several aspects of the workflow in more detail.
The exploration framework has been implemented and allows the automated data generation for robots assembled with the robot configurator. The implementation is designed in a modular way so that without much effort it is possible to switch between different capability function models. In the future it could be possible to implement multiple simulation engines as well. The classification into valid and invalid capabilities with a validation model has been implemented.
After the training process, the validation model can be queried for the information whether a capability can be executed by the robot. However, selecting a capability and then checking it is inconvenient for the clustering process. For this reason, in the next step we are looking into inverting this validity information, i.e. mapping all valid (and only the valid) capabilities into a continuous parameter space, on which the clustering can operate.
The most challenging problem for the exploration is how these approaches scale up with more and more complex systems. In order to deal with this we plan to make use of more sophisticated search strategies. As the exploration is supposed to be task independent we intend to use intrinsic motivations [16] to explore the search space in a structured manner.
Another challenge is the exploration of perception capabilities which is considered in the theoretical framework, but requires non-trivial environments to perform the exploration. Designing test environments to allow a mostly task independent exploration is one main challenge, besides the fact that for perception the capability space is even higher compared to kinematic and dynamic exploration.
In parallel to the exploration approach, an introspection into failure cases is envisaged via a hierarchical capability checking framework which can (a) detect whether a given action is feasible on the robot and (b) pin-point the reasons of infeasibility. This problem is especially interesting for mechanisms with closed loops, e.g., parallel robots or serial-parallel hybrid robots [50]. We plan to exploit knowledge about the kinematic structure of the robot, its various physical properties, and analytical mappings between different spaces (actuator coordinates, generalized coordinates and full configuration space coordinates) by using HyRoDyn which is under active development in Q-Rock.
The following step of capability clustering and the application of cognitive core and behavior model for-malizations has also been shown successfully, while revealing interesting challenges.
Whereas only hand crafted behavior models and feature spaces have been tested to date, we aim at a more automated approach in the future. We actively research the application of variational autoencoders to world state trajectory data, and how well features found in this way can be semantically interpreted. Furthermore, we work on automated extraction of behavior model definitions, both from human demonstration data and from modelling human evaluation functions.
Although our goal is to increase the level of automation in the future, we still see human labelling as a crucial backstop in the cycle to give meaning to the explored data and to introduce steps for revision as part of the bottom-up path.
An interesting theoretical problem is defining where the domain of cognitive cores ends, and the domain of planning begins, i.e. up to which behavioral complexity level cognitive cores can be reasonably defined. The cognitive core formalism is purposefully flexible enough such that planning algorithms can be expressed, thus there is no clear limitation imposed on the formal side.
From the view point of Q-Rock as a whole, we see two major challenges arising for future work. On the one hand, the system requires a rich database of annotated components (i.e., single parts or already simple robots), together with a design flow to create new assemblies, in order to generate a significant added value for users. A proposal for modelling components in such a design flow has been developed in the predecessor project D-Rock and is described in Section 3.1. With this, Q-Rock has to be thoroughly tested, such that new robotic devices are created and many cognitive cores are built, which in turn fosters an enriched ontology to also interact with the user. During this process, it is important to evaluate the results of Q-Rock in terms of completeness of found behaviours, as well as stability and robustness of the underlying representations.
A related challenge is the introduction of Q-Rock to a considerable number of users to start forming a community. As a first step towards this goal, we intend to publish the parts of Q-Rock open source, and a strategy for addressing the robotics community is currently being formulated. The more users interact with Q-Rock, and thereby also enrich the database, the more individual users will benefit and the more versatile will it become.
Taking a further step back, the Q-Rock system is part of a greater development cycle in the X-Rock project series. In D-Rock, the groundwork was laid for simplified modelling of robot parts and construction of robots based on well defined interfaces. In Q-Rock, robots are enabled to explore possible behaviors. In future projects, beyond the scope of currently ongoing research, we plan to tackle questions regarding combinations of systems and their respective cognitive cores, behavioral interactions between humans and robots or groups of robots, and finetuning of behaviors for specific contexts.