6G Networks: Beyond Shannon Towards Semantic and Goal-Oriented Communications

The goal of this paper is to promote the idea that including semantic and goal-oriented aspects in future 6G networks can produce a significant leap forward in terms of system effectiveness and sustainability. Semantic communication goes beyond the common Shannon paradigm of guaranteeing the correct reception of each single transmitted packet, irrespective of the meaning conveyed by the packet. The idea is that, whenever communication occurs to convey meaning or to accomplish a goal, what really matters is the impact that the correct reception/interpretation of a packet is going to have on the goal accomplishment. Focusing on semantic and goal-oriented aspects, and possibly combining them, helps to identify the relevant information, i.e. the information strictly necessary to recover the meaning intended by the transmitter or to accomplish a goal. Combining knowledge representation and reasoning tools with machine learning algorithms paves the way to build semantic learning strategies enabling current machine learning algorithms to achieve better interpretation capabilities and contrast adversarial attacks. 6G semantic networks can bring semantic learning mechanisms at the edge of the network and, at the same time, semantic learning can help 6G networks to improve their efficiency and sustainability.


I. INTRODUCTION
Even though 5G networks are still at an early deployment stage, they already represent a breakthrough in the design of communication networks, shaped around their ability to provide a single platform enabling a variety of different services, ranging from enhanced Mobile BroadBand (eMBB) communications to virtual reality, automated driving, Internet-of-Things, etc.Looking at future new uses of technologies, applications, and services, as well as at the recent predictions for the development of new technologies expected for 2030, it is already possible to foresee the need to move Beyond 5G (B5G) and to design new technological enablers for B5G connect-compute networks [1], incorporating new technologies to satisfy future needs at both individual and societal levels.While some near future technological solutions will be included in the long-term evolution of 5G, others will require a radical change, leading to the standardization of the new 6th Generation (6G).
The main goal of this paper is to motivate the need, in the design of new 6G networks, for a paradigm shift from the mainstream research, which basically builds on Shannon's framework, towards semantic and goal-oriented communications.In 1948, Shannon established the basis for a mathematical theory of communication, deriving the conditions ensuring the reliable transmission of a sequence of symbols over a noisy channel.In the following past 70 years, building on Shannon's theory, the research on communications has produced a number of significant advancements, including multiple-input multiple-output (MIMO) communications, new waveform design, mitigation of multiuser interference in both uplink and downlink channels, etc.Furthermore, a remarkable progress has been achieved in the network design, passing from the initial Internet protocol to nowadays network traffic engineering, network function virtualization (NFV), software defined networking (SDN) and network slicing.
Today, while deploying the fifth generations (5G) of wireless communication systems and kicking-off research on beyond 5G (B5G) future networks [1], the need for a paradigm shift from Shannon's legacy begins to take shape.The motivation is dictated by the observation of the current trend witnessing the demand for wider and wider bandwidths to cope with the ever increasing request of higher data rates to accommodate for the incoming new services, like virtual reality or autonomous driving.Clearly, this never ending request is doomed to face, at some point, a bottleneck because resources are evidently limited or, in some cases, their further increase can induce significant challenges that can ultimately jeopardize the benefits.One example is the increase of carrier frequencies: As frequency increases, there is more room for wider bandwidth, but several undesired phenomena appear, like blocking, atmospheric absorption, power amplifier efficiency drop [2], etc.
Going back to Shannon to analyze the main assumptions of his work, we find that Shannon and Weaver, in 1949, categorized communications in three levels [3]: (i) transmission of symbols (the technical problem); (ii) semantic exchange of transmitted symbols (the semantic problem); (iii) effect of semantic information exchange (the effectiveness problem).Shannon deliberately focused on the technical problem.At that time, this was indeed a very intelligent move, as it enabled him to derive a rigorous E. Calvanese-Strinati, is with Université Grenoble Alpes, CEA, Leti, F-38000 Grenoble, France (e-mail: emilio.calvanese-strinati@cea.fr).S. Barbarossa is with the Department of Information Engineering, Electronics, and Telecommunications, La Sapienza University of Rome, 00184 Rome, Italy (e-mail: sergio.barbarossa@uniroma1.it).arXiv:2011.14844v1[cs.NI] 4 Nov 2020 mathematical theory of communication based on probabilistic models.However, nowadays, the vision of the network as an enabler of pervasive intelligent services, with a strong emphasis on its effectiveness and sustainability, suggests that assuming semantics as irrelevant is no longer justified.Rather than "how" we transmit, we need to focus on "what" we transmit.Or, quoting John von Neumann, "there's no sense in being precise when you don't even know what you're talking about".
In our vision, taking into account precisely the initial categorization of communication in three levels, as suggested by Shannon and Weaver, we think that 6G can represent a significant leap forward if it will incorporate semantics and effectiveness aspects in its design.The new vision is suggested again from another visionary giant, Nikola Tesla, who stated, in 1926, that "when wireless is perfectly applied, the whole Earth will be converted into a huge brain".Following this idea, we believe that the network design can receive significant hints by observing how the human brain processes the signals perceived from the environment.The brain in fact learns from past actions (and from the culture accumulated by the humankind in its history), and takes intelligent decisions, in the due time, with a sustainable energy consumption.Mimicking this excellent example provided by nature, we believe that 6G networks should take semantics and effectiveness aspects as central aspects of network design.In this context, focusing on semantics and clearly identifying the goal of communication, helps us to distil the data that are strictly relevant to conveying the information intended by the source or to fulfilling a predefined goal.Disregarding irrelevant data becomes then a key strategy to significantly reduce the amount of data to be transmitted and recovered, thus saving in bandwidth, delay and energy.According to this view, goal-oriented and semantic communications will be a keystone for exploring the meaning behinds the bits and enabling brain-like cognition and effective task execution among distributed network nodes.In this context, information has no value unless it has an exploitable and explainable meaning.This change of perspective represents a fundamental paradigm shift where success of task execution at destination (effectiveness problem) is the core concern rather than achieving error-free communications (technical problem).
The paper is organized as follows.In Section II we highlight the new use cases that motivate the need for a new generation of wireless communication networks.In Section III, we argue about the need for moving beyond Shannon and state in what respects we predict a significant advancement.Then, in Sections IV and V, we focus on semantic communications and on goal-oriented communications, respectively.Afterwards, in Section VI, we illustrate the increasing interplay between network design and artificial intelligence, with a focus on bringing intelligent mechanisms at the edge of the network, to enable delaycritical services.Moreover, we stress the importance of merging machine learning with semantics.In Section VII, we dig into smart mechanisms to change the channel physical properties on demand using reconfigurable intelligent surfaces.Finally, some conclusions are drawn in Section VIII.

II. 6G USE CASES, KPIS, ROADMAPS AND NEW SERVICES
Starting from the first generation of wireless cellular networks (1G) in the 80s, every eight to ten years a new generation of wireless communication systems has been standardized.Moving to the design and engineering of a new generation has been motivated by the ambition to meet new societal challenges, to enable radically new use cases targeting new value creation.A new generation builds up on the evolution of technologies already adopted and on a few new technological break-troughs and new network architectures that enable revolutionary new services.A fundamental question when starting research for the design of a new generation is whether the new generation should be backward compatible or clean slate.This is a never ending debate.Mechanics of economics and experience from the past encourage the view that the next generation should be as much as possible backward compatible with previous ones.This avoids huge CAPEX investments for renewing hardware in the network and for terminals accessing to the service.The drawback is the radical potential benefits limitation of revolutionary technologies that might be either not included in the new standard or not fully exploited.Similar to what happened in the past, the combination of technological readiness, combined with continuously evolving economic and societal challenges, as well as new legislation and regulations, creates momentum for innovating, engineering, and standardising new telecommunication generations.Research on future 6G networks has already started.Multiple technological enablers for beyond 5G networks are currently investigated following roadmaps to enable 6G services by 2030 [1].
From one hand, the evolution toward beyond 5G networks is shaped following the "classical problem of wireless communications", which is focused on achieving reliable and cost effective data communication over noisy channels.Already starting with 5G, multiple concurrent key performance indicators (KPI) have been identified to characterize performance, such as peak data rate, area traffic capacity, connections density, communication reliability, end-to-end latency, spectrum efficiency, energy efficiency, etc. [4].In 5G, those KPIs have been mapped into specific use cases and services [5].
According to the roadmap to 6G [1], we are now in the phase of defining the tentative KPIs for serving future 6G services in speculative identified use cases [6].While the vision on what should be 6G is still evolving, academia, industry and standardization bodies are working on candidate KPIs for future 6G use cases [1] and applications [6] [7].Some of those services will be first offered with 5G technologies, or its long term evolution (LTE); others will require disruptive technologies and completely new network operations to meet their stringent requirements, following the usual never-ending technology growth model.However, societal and environmental needs are stimulating radical changes into today economical approach to business and value creation.The societal acceptance of a new technology is already at a critical stage and future 6G networks are required to take into account societal and environmental issues rather than just creating new business opportunities and added value for operators, industry and IT companies.
Already today, we are experiencing how society and industry are becoming more and more data-centric, data-dependent, and automated.This phenomenon is expected to intensify in the next decade and beyond, since the fusion of digital and real worlds and the support of distributed networked intelligence and automation across all dimensions are driving the next technological revolution.The boundary between computer science, artificial intelligence and telecommunications is disappearing, creating the momentum for a plethora of new applications and challenging the future 6G networks with the ongoing race between cost and complexity of delivering new services.The ITU 2030 group published a first speculative vision on future 6G services and use cases [8], identifying the evolution of virtual reality (VR) and mixed reality (MR) services as a main driver for future 6G services.This next frontier for multimedia will include holographic media [9] and multi-sense including haptic communication services.Such applications will dominate services not only in the realm of entertainment, teleconferencing and smart working, but they will also enable more life-impacting and industrial productivity applications.Those futuristic services will be for instance key to the Japanese Society 5.0 vision [10] or for remote holographic presence [11], industrial maintenance in hostile operational environments and intelligent production at broad.Such family of new use cases will impose stringent requirements in terms of per link capacity (for instance holographic communications employing multiple-view cameras are expected to require several terabits per second (Tb/s) per link at both uplink and downlink [12] [13], which is not supported by 5G), stringent end-to-end (E2E) latency to ensure real-enough virtual and seamless remote experience.6G targets also services, such as industrial automation, autonomous system, and massive networks of sensors, in which machines, and not humans, are the endpoint.For such services, communicate-and-compute services will require new stringent requirements in terms of latency and its jitter in order to ensure a seemingly deterministic performance of the network [14].Furthermore, extremely high reliability will be required to improve performance not only at the physical to networking layer but also on inference-based intelligent mechanisms supporting them.Clearly, the specific targets on both communication [13] and inference reliability depends on the specific use case.With 6G, new applications will not be limited to the realm of entertainment and teleconferencing, but more disruptive applications begin to emerge, some of which are life-impacting while others provide alternative solutions for intelligent production and smart mobility with multi-dimensional transportation network consisting of all ground-sea-air-space vehicles with peak mobility up to 1000 Km/h (see, for instance, the hyperloop transportation system [15]).This multi-dimension mobility vision opens also the opportunity for 3D native services, enabling end users and machines moving in the 3D space to perceive seamless 6G service support and teleport cloud functionalities on demand, where and when the intelligence support is needed in the 3D space [16].To this end, KPIs such as localization precision and uniform user experience will be defined in both 2D and 3D.
Following the same conceptual trends of 5G, speculating on possible requirements of future 6G use cases and applications and considering the potentials offered by new technology enablers, in Table II  These KPIs will not have to be achieved all simultaneously, all the time, everywhere, in every possible condition.Instead, a selected subset of KPIs should be attained locally in space and time, depending on future 6G application and service needs, with a high degree of flexibility and adaptivity.The targeted 6G performance improvement in terms of data rate, latency at radio interface and network energy efficiency, etc. follows the well established performance driven KPIs mechanics: a technology intense evolution across generations.This translate in imposing a factor 10 to 100 of KPIs of improvement between wireless networks generations.The rational is to deal with the expected exponential traffic growth and more immersive and interactive foreseen services.The most representative trend in today wireless communications is the unrelenting increase in signal bandwidths to achieve higher link capacity, increasing the network capacity and improve the user's QoE.This is also a major well accepted mega-trend for 6G: the KPI is to achieve a factor of 100 in capacity by accessing the huge available bandwidth either in the sub-THz D-band (above 90 GHz) or in the visible light spectrum [17] [1] [18].Up to now, generations of wireless systems have been designed to accomodate the exponential growth of downlink traffic.Nevertheless, starting from 4G, we experience a reduction and sometime inversion of the asymmetry between uplink and downlink traffic [19].
Even though there are no precise forecasts on uplink traffic evolution, the traffic pattern change is inevitable.The uplink traffic is exploding at much faster rate than the downlink traffic.Already with 4G networks, a study by NSN showed how the overall usage ratio between uplink and downlink reached already approximately 1:2.5 [20] for services like pear-to-pear TV, pear-to-pear sharing, massive IoT and cloud support.This is due to the introduction of the cloud support and the rising use of content sharing platfoms.A larger share of data is crossing the networks, conveyed on the uplink between a huge number of connected devices, collecting large amounts of data and requiring a pervasive support of offloading services to the cloud (computation and storage).Moreover, since 5G the raising support of machine learning algorithms is causing a further explosion of the uplink traffic.5G uplink capacity has not been sized to meet such exploding demand for the next decade.In order to accommodate the rising share of uplink traffic, similar capacity requirements are foreseen for uplink and downlink in 6G.In addition, device-to-device (D2D) communication will consume an increasing share of the network capacity, defining a novel layer of communication.A major lap kicked off with 5G is the increasing interplay between communications and computation.With 6G this will be further intensified, having distributed edge intelligence nodes collecting, processing, storing data.Some identified use cases such as Industrial IoT or virtual reality, already impose new KPIs requirements such as stringent latency bounds, packet delivery jitter, reliability or achievable throughput, but also regarding the systems dependability, i.e. the ability to make guarantees for a deterministic behaviour [14].An application is remotely controlled high-precision manufacturing, requiring jitter delays in the order of a microsecond [13].These requirements are typically quite distinct from those that have traditionally guided the design and deployment of public 5G networks.
Nevertheless, in our view, the most remarkable feature of 6G will be not only the performance improvement in terms of typical KPI.6G is going to represent a paradigm shift reflecting a degrowth-by-design approach, leading to the introduction of completely new types of KPIs.The new perspective is to look at the network as to a truly pervasive computing system enabling new interactions among humans and machines and intelligent services with sustainable costs, in economical and ecological terms.5G already represents a significant step forward toward in this direction; 6G will take this perspective as its driving paradigm.Within this perspective, new KPIs will come into play, such as inference reliability or energy per goal.Of course, there is no single number for these KPIs, as they will depend on the specific services.
For example, new reliability requirements for 6G services will not just set a 10 4 factor on reduced communication frame error rate.6G services will also impose new requirements in terms of reliability of the intelligence support to actuation (inference reliability).In 5G, the same level of reliability is imposed to every transmitted bit, since every bit is equally important.With 6G services, shifting the perspective to semantics and goal-oriented communications, different bits may have different relevance in conveying information or in fulfilling a desired goal within a time constraint.Therefore reliability constraints might vary depending on their significance.This is a completely new perspective brought by 6G.
With 6G, KPIs on energy will be not only on network energy consumption [21] or terminal battery life extension.The ambition is to achieve, wherever possible, battery-free communication, targeting communication efficiency on the order of 1 pJ/b [22].Moreover, since 6G operation will be intensively supported by machine learning and artificial intelligence, specific energy constraints will be defined in the generation-to-processing data chain.
In addition, 6G will offer enhanced ground services supported by non-terrestrial networks as well as new pure threedimensional (3D) services [16].The 3D component is a completely new territory for network design, in particular when aspects and KPIs like e.g.coverage, capacity, reliability, localization accuracy and mobility are to be extended and evaluated in 3D.

III. BEYOND SHANNON?
Until the 5G era, communication has been the basic commodity of every wireless generation.The key challenge has been the reduction of the uncertainty associated to the correct reception of exchanged data, while targeting higher capacity and reliability and lower latency.Such legacy of Shannon's model has pushed a never-stopping race for broader bandwidths, thus exploring higher frequency bands.Since the deployment of 4G, the energy consumption of network and wireless devices has limited the practical services' operation, pushing research to approach the theoretical communication limits, established by Shannon, while optimizing the use of available resources.Already with 5G, the communication network has evolved towards a communicate-and-compute system, where the support of the (edge)cloud has fed the cybernetic vision of Norbert Wiener, where communicate-compute-control tasks generate a continuous loop involving sensing, computing, controlling and actuating, laying the foundations for the birth of intelligent machines.6G services will induce drastic changes on the conventional notions of knowing and learning, guessing and discovering.This will require significant advances on the communicate-andcompute infrastructure, paving the way to making knowledge and decision a commodity of next generation networks.In such a futuristic context, information accumulates at a rate faster than what can be filtered, transmitted and processed by some kind of intelligence, either natural or artificial.The bottleneck, traditionally represented by the unreliability of the communication medium, is drifting towards the reliability of decision mechanisms supporting the intelligent interaction between humans, machines and the environment.
Keeping in mind the inevitable limitedness of available resources, the challenge is to design the new network, while respecting a degrowth principle.The question is: Can we deliver more intelligent mobile services without necessarily requesting for more capacity, more infrastructure (communication, computation, storage), more energy?
In this paper, we argue that a degrowth principle can be pursued and it is actually achievable, at least in some applications, if we do not constrain ourselves to design the future communication network under the imperative that every transmitted bit be equally important and should then be reliably recovered at the receiver side.More generally, we claim that a more efficient system design is possible if the design does not rigidly follows Shannon's communication paradigm.In his seminal work, Shannon established the principles for the reliable transmission of symbols over a noisy communication channel [3].However, today we look at communication as a part of a much more complex system involving the interaction between humans and machines having various goals and degrees of intelligence (natural or artificial).In such a qualitatively different context, the semantic and effectiveness aspects become preeminent features that can no longer be neglected.Shannon's framework will of course keep providing the foundations for the reliable exchange of symbols, but the design of the communication system will need to incorporate semantic-related concepts and take into account the specific goal of communication.
We briefly illustrate now the main research axes motivating the need to move beyond Shannon.
• Semantic communications: Communication among humans involves the exchange of information, where the word information is associated to meaning; in conveying a concept from source to destination, the relevant aspect is what is communicated, i.e. the information content, not how the message is brought to the destination.A correct semantic communication occurs if the concept associated to the message sent by the source is correctly interpreted at the destination, which does not necessarily imply that the whole sequence of bits used to transmit the message be decoded without errors.• Goal-oriented communications: Communication among interacting entities is often carried out to enable the involved entities to accomplish a joint goal.The fundamental system specification is then associated to the goal, its correct accomplishment, within a given time constraint, using a given amount of resources (energy, computation, etc.).The communication system enabling the interactions among the entities involved in the goal should be defined in order to focus on the goal-related specifications and constraints.This means for example, that all information not strictly relevant to the fulfillment of the goal can be neglected.• Online learning-based communications: The increasing pervasive introduction of machine learning tools in all layers of the communicate-compute network yields a further breakthrough in the network design.Online machine learning algorithms provide the possibility to reshape traffic, change coding and decoding strategies, scheduling, etc., as a function of an online monitoring of the network.• Wireless Environment as a Service: The common assumption in wireless communications is that the channel is given and it cannot be altered according to the communication needs.However, with the advent of Reconfigurable Intelligent Surfaces (RISs) [23], there is the possibility to adjust the communication channel to control wireless connectivity and mitigate interference.In this way, it is possible, for example, to increase the channel capacity without necessarily increasing neither the transmit power nor the bandwidth, or to reduce the associated electromagnetic field footprint.In the following sections, we will review the above fundamental challenges and opportunities associated to the above major changes.

IV. SEMANTIC COMMUNICATIONS
In their seminal work, Shannon and Weaver suggested that the broad subject of communication can be organized into three levels [24]: Level A. How accurately can the symbols of communication be transmitted?(The technical problem.)Level B. How precisely do the transmitted symbols convey the desired meaning?(The semantic problem.)Level C. How effectively does the received meaning affect conduct in the desired way?(The effectiveness problem.) Shannon [3] provided a rigorous and formal solution to the technical problem, lying the foundations of what is today known as information theory.Shannon, left deliberately aside all aspects related to semantic and effectiveness.However, as soon as communication starts being perceived as a commodity enabling a variety of new services, interconnecting humans and machines with various degrees of intelligence (natural or artificial), the semantic and effectiveness aspects become preeminent actors that can no longer be neglected.Several schools of thought have proposed different alternative approaches to generalize Shannon's information theory, each aimed at emphasizing different perspectives: philosophy of information [25], logic and information [26], information algebra [27], information flow [28], quantum information theory [29], algorithmic information theory [30], [31].Building on a more general information theory, there are also various proposals concerning the design of a semantic communication system [32], [33], [34], [35].The authors of [32] addressed the problem of potential "misunderstanding" during communication, where the misunderstanding arises from lack of initial agreement on what protocol and/or language is being used in communication at the source and destination sides.In this context, "reliable communication" means overcoming any initial misunderstanding between parties towards achieving a given goal.Other works focus on the transmission of text and, in such a case, a semantic error is defined building on the notion of similarity between words expressing the same concept, in a given language [35].
Before delving into the technical problems associated to the definition of a semantic communication system, it is necessary to clarify what do we mean by semantics.In very general terms, semantics is associated to meaning and a genuine theory of information should be a theory about the information content, or meaning, of a message, rather than a theory about the symbols used to encode the messages.To distinguish between the different interpretation of the word information, in the following we will use the term semantic information, to refer to information as associated to a meaning, and the term syntactic information, in the Shannon's sense, associated to the probabilistic model of the symbols used to encode or transmit information.As a very simple example, pressing the keys of a computer keyboard at random generates a message that has a high syntactic information, because the generated symbols are approximately independent and uniformly distributed, but most likely the generated message carries zero semantic information, as it does not carry any meaningful content at the destination.
Semantic information is associated to the level of knowledge available at the destination.In general terms, quoting Dretske [36], "information is that commodity capable of yielding knowledge, and what information a signal carries is what we can learn from it."From this perspective, a semantic communication from source to destination occurs correctly, or with a high degree of fidelity, under the following circumstances: 1) a semantic communication system is required to guarantee the semantic equivalence between the message emitted by the source and the message recovered by the destination, even though source and destination might possess a different background knowledge; 2) the destination is able to increase its level of knowledge thanks to the received message; 3) whenever communication takes place to fulfill a joint goal between source and destination, the correct interpretation of a message is associated to the effectiveness of the message in helping to accomplish the goal.Semantic equivalence means that the meaning intended by the source of the message is equivalent to the meaning understood by its destination.This way of looking at information marks a significant departure with respect to the way information is used in Shannon's information theory, in at least four respects: 1) the amount of information conveyed by a message is associated to its semantic content, and it is not necessarily related to the probability with which the symbols used to encode the message are generated; 2) in semantic communication, what matters is the specific content of each message, and not the average information associated to all possible messages that can be emitted by a source; 3) the amount of information conveyed by a message depends not only on the message itself, but also on the level of knowledge available at source and destination, at the time of communication; 4) whenever communication is only an action performed to accomplish a joint goal, the correctness of the message interpretation is associated to the effectiveness achieved in the goal accomplishment thanks to the message.Since semantic communication is related to a correct semantic interpretation of the received messages and to possibly a change of knowledge resulting from the acquisition of a message, it is necessary to be able to represent knowledge in a formal way.Knowledge Representation (KR) and reasoning is indeed one of the cornerstones of artificial intelligence [37].The goal of KR is the study of computational models to represent knowledge by symbols and by defining the relations between symbols, in a way that makes possible the production of new knowledge.Among the many alternative ways to represent knowledge, graph-based knowledge representation plays a key role [38].An example of graph-based representation is given by a conceptual graph, whose nodes are associated to entities, whereas the edges represent relations among entities [38].Given the vastness of knowledge, it is unthinkable to represent all knowledge within a single framework.The only viable approach is to build knowledge base (KB) systems associated to specific application domains.For each application domain, a KB is typically composed by a computational ontology, facts, rules and constraints [38].A computational ontology provides a symbolic representation of the objects belonging to the application domain, together with their properties and their relations.Furthermore, besides the ontology, a KB system includes a reasoning engine, built on the rules and the constraints associated to the given application domain.The goal of the reasoning engine is to answer questions posed within the application domain.
A key aspect of a KB system worth of being pointed out is that it is not possible to assume that the KB be able to provide a complete picture of the application domain which it refers to.This happens because, even in a restricted domain, each object might have relations with a huge number of other facts or objects, so that it would not be possible to encompass all these relations.As a consequence, the incompleteness of the description is a central feature of a knowledge-based system [38], and it represents a key distinction with respect to a database.Furthermore, incompleteness of a KB comes also from computational constraints as a complete reasoning might be very time-consuming.As a consequence of its incompleteness, a KB system might not be able to answer, for example, to the question if a statement is true or false, within a given time interval.Conversely, the need to provide an answer, while respecting a time constraint, typically results in an answer that is correct but only within a certain degree of reliability.
It is worth to point out that, in general, the KB available at the source, say KB S , may differ from the KB available at destination, say KB D .We say that a message is correctly interpreted at the destination node, according to KB D , if its interpretation is equivalent to that given at the source node, according to KB S , or if it induces a valuable modification of the destination KB, either in its ontology or in the definition of the reasoning rules.In the case of graph-based KRs, the change of the KB is reflected into a change of the graph.This change becomes then a possible way to measure the increase of knowledge carried by a message.
A key feature of a KB system is that the inference made on a message should depend only on the semantic, i.e. meaning, of the message and not on its syntactical form.This means that there could be alternative ways to encode the same concept into formally different sequences of symbols, all of which should give rise to the same semantic representation.As a simple example, the answer to the question "how much is two plus two" could be the sound "four" or it could be the symbol 4 written on a piece of paper.The encoding mechanism and the number of bits necessary to encode the two messages would be totally different, but the semantic information would be exactly the same.
Based on the ideas mentioned above, a communication system incorporating the three levels of communication mentioned by Shannon and Weaver can be represented as in Fig. 1.The block diagram shown in Fig. 1  to the three levels of communication: technical, semantic and effectiveness.At the effectiveness level, there are two entities, a source S and a destination D, that interact with each other through an environment.Source and destination nodes could be agents, where, following AI terminology, an agent is something that can "operate autonomously, perceive the environment, persist over a prolonged time period, adapt to changes, create and pursue goals" [37].In particular, we consider rational agents, i.e. agents that act so as to achieve the best outcome of their acts.An agent could be a human, a machine, or a software.
The scope of the interaction can be very broad in nature: sensing, controlling, extracting information from the environment, exchanging information, etc.To interact, the source S generates a message w conveying the semantic information that S wishes to share with D. This message w is generated according to the ontology and the rules given by the knowledge system KB S available at the source.For instance, a concept could be represented, equivalently, by a speech signal or a by a text, produced using a given language.
To be physically conveyed to the destination through a physical medium, the message w is translated into a sequence x of symbols, typically bits.This translation includes in general a source encoder, to reduce the redundancy contained in the message, followed by a channel encoder, introducing structured redundancy to increase the communication reliability.Rules and properties of source and channel encoders follow the principles of Shannon information theory.The combination of source and channel encoder is denoted as a syntactic encoder, as it affects only the form of the message, but not its semantic content.The sequence x is then transformed into a physical signal, like an electromagnetic wave or an acoustic wave, to make it well suited to pass through the physical channel available for communication.
At the destination side, the received signal is syntactically decoded to produce a sequence of symbols x .Ideally, x should coincide with x, if there are no errors at the physical layer.Finally, the received sequence is interpreted, based on the knowledge system KB D available at destination, to produce a message w that should be equivalent to w.This does not necessarily imply that the structure of the message w associated to the concept be identical to the structure of w.What is necessary is only that, once interpreted according to the knowledge base system KB D available at destination, the concept extracted from w be semantically equivalent to that represented by w.
In a communication system, there might be errors at the syntactic level as well as errors at the semantic level: an error at syntactic level occurs if x differs from x; an error at semantic level means that w is not equivalent to w. Errors at syntactic level may occur because of the presence of random noise or interference introduced during the transfer through the channel, or because of unpredictable channel fluctuations.Errors at semantic level could be due to differences between the KB systems available at source and destination nodes, or because of some kind of semantic noise, i.e. something that alters the concept emitted by the source, like fake news for example.
Clearly, the semantic layer relies upon the syntactic layer: too many errors in the decoding of the received sequence y may preclude the recovery of the source message w.However, and this is the interesting new aspect brought forward by the inclusion of the semantic layer, an error at the syntactic layer does not necessarily imply an error at the semantic layer.The message interpreter can in fact recover the right content even if there are a few errors in decoding the received sequence of symbols.Conversely, there could be errors at semantic level, even if there is no error at the syntactic level.This may happen because of the difference of the KB's available at source and destination, so that a message that has been correctly decoded at the syntactical level, can be misinterpreted at the semantic level.
As in any reliable system, feedback plays a key role.If a packet of symbols is affected by errors, i.e. x = x, the receiver can require the retransmission of that packet through the syntactic feedback channel.This kind of feedback operates at the syntactic level, as it has to do only with the form of the received message.A semantic feedback is also possible, whenever the meaning of the message provided by the semantic interpreter is unclear.The message interpreter at destination can in fact send a feedback to the semantic message generator at the source side, to require the retransmission of the message w, or maybe a different version of w that facilitates its interpretation at the receiver side.The additional feature of the semantic communication system is the interaction between the different layers.The semantic interpreter at the destination can in fact send a feedback to the syntactic encoder as well, as shown in Fig. 1.For example, the semantic decoder can tell the source encoder to reduce the data rate because the message that is being received can be easily decoded and predicted (up to a certain time interval), at the semantic level, so that it is not necessary to transmit all the fine details that are currently sent.In this way, the whole system might achieve the same accuracy in the recovery of the transmitted information, but saving important physical resources such as energy or bandwidth.
The interaction between different communication levels paves the way to a new way to design communication systems.Nowadays, communications are designed to ensure that there are no errors at syntactic level, which means that the sequence of symbols used to encode the message emitted by the source should be correctly received at destination, irrespective of what is being transmitted, i.e. of the information content encoded in the transmitted message.In a semantic communication system, what matters is that the receiver be able to recover the content of the information sent by the source.There could be errors at syntactic level that could be easily corrected at the semantic level, without requiring the retransmission of the corresponding packets.Going even further, there may be parts of the message that may not be able to reach the destination, perhaps due to blocking effects at the physical level, such as in millimeter wave communications, but the interpreter may still be able to reconstruct the semantic message, based on a well tuned prediction model.
A few examples can help to illustrate this aspect.In the transmission of a speech signal, for instance, if a number of bits cannot be recovered because the channel has undergone a deep fade during their transmission, what is important is that the sentence is reconstructed correctly.This means that a suitable automatic text correction algorithm able to reconstruct a meaningful word (or sentence) from a subset of letters (or words) would be able, equivalently, to correct a large number of (syntactic) errors at the bit level, without requiring the retransmission of the erroneous, or missing, packets.A further example is the transmission of a video.Suppose that the video is capturing the scene of a walking person and that, at some points, a number of frames are lost because of a deep channel fade.In such a case, an algorithm running at the receiver side can reconstruct the missing frames using a well trained prediction model.If no major unexpected change occurs during the channel fade, the overall flow of events captured by the video could be reconstructed, with apparently no harm at semantic level.In such a case, the message interpreter would be able to reproduce a video that is not necessarily equal to the transmitted video, but it is semantically equivalent.Clearly, this approach can produce a significant saving in terms of transmit power and/or bandwidth.
The price paid for this advantage is the additional computational complexity at the receive side and, in turn, a further delay, which in some applications could represent a serious bottleneck to the introduction of semantic communication.To reduce the additional delay, in our vision, future communication systems could take valuable suggestions from the way our brains operate.Among the many theories of human mind, there is a beautiful theory denoted as predictive mind, supported by experimental evidence, that our mind works as a predictive machine that continuously predicts what is going to experience from its senses and updates its (hierarchical) interpretation models accordingly, learning from prediction errors [39], [40].This strategy is a mix of bottom-up and top-down operations.As an example, if a speaker is delivering a talk from remote, the receiver, based on its current knowledge of the speaker and the portion of the video received so far, can predict the next frames according to a model (top-down).On the basis of the prediction error, it can refine the model (bottom-up) and so on.If the physical channel undergoes an abrupt fade, causing the loss of a large number of bits, as long as the prediction model is sufficiently accurate, the receiver may not perceive any loss, at the semantic level.Digging even further into the human vision system, there is a school of thought stating that our brain is essentially a prediction machine that constantly attempts to match the incoming sensory inputs with top-down expectations or predictions [41].This is achieved using a hierarchical generative model that aims to minimize prediction error within a bidirectional cascade of cortical processing.According to this theory, it is the brain that selects a small subset of the multitude of signals coming from the retina, as a function of what it is expecting.In this way, not all signals produced in the retina travel through the optical nerve.This represents indeed a very efficient way of working, as it saves a lot of energy, and it could be translated into next generation artificial visual systems.
Interesting examples already incorporating semantic aspects in text transmission are [35] and [42].In [42] the authors, building on Natural Language Processing (NPL) tools, and on DNNs, proposed a deep learning based semantic communication system, named DeepSC, whose goal is to maximize the system capacity and minimizing the semantic errors by recovering the meaning of sentences, rather than bit-or symbol-errors in traditional communications.The next challenge is to extend this approach to signals requiring higher data rates, such as video, for example, by building on a notion of semantics associated to a sequence of frames.

V. GOAL-ORIENTED COMMUNICATION
An additional framework to achieve drastic advantages with respect to the conventional Shannon's model is goal-oriented communication, where communication has a well identified scope: Contribute to a goal achievement.In such a case, bits are not all equally informative, but it is the goal that highlights which bits are more relevant to its accomplishment.Earlier works on goal-oriented communications are [43], [33] and its extension [32].In those works, the authors addressed the problem of potential "misunderstanding" among parties involved in a communication, where the misunderstanding arises from lack of initial agreement on what protocol and/or language is being used in communication.In this section, we propose a different view, starting from the basic assumption that the communication occurs to fulfil a goal.As a consequence, the performance of the system is specified by the degree of fulfillment of the given goal or, more precisely, on the effectiveness achievable in the fulfillment of the goal given the amount of resources used to do it.Suppose for example, that the goal is the inference of some property of an observed system, like e.g., the detection of an anomaly in a manufacturing process, whose state is unknown to the observer.To make the inference, the observer exploits a system of sensors that collect data measuring some physical property of the system.The sensors transmit their data to a fusion center that takes a decisions on what is being observed.How much information does a sensor need to send to the decision maker?The answer depends on the accuracy with which the decision must be taken, possibly subject to constraints on the time needed to take a decision and the energy spent by the sensors to collect the data and send them to the fusion center.
Suppose that the goal of the system is to learn a set of parameters θ from a set of observations x i , i = 1, . . ., N .Let us denote with X := {x i } N i=1 the set of measurements.Suppose also that θ is a set of deterministic parameters and that the observations x i are outcomes of a vector random variable described by a probability density function (pdf) p(x; θ), with vector parameter θ.In our goal-oriented communication scheme, a sensor collects the data-set X and sends the data to a fusion center that has to estimate the parameter vector θ from X.If the goal is to estimate θ with a target accuracy level, it might not be necessary to send all the vectors X; what is necessary is to send a function of X, say T(X), such that the accuracy in the estimation of θ from T(X) is the same as that achievable using the observation X directly.The question is then: Given a vector random variable x, described by the pdf p(x; θ), does there exist a function T(x) that guarantees no accuracy loss and provides some gain in the communication setup ?The answer is given by the sufficient statistics of x.From basic statistical signal processing, we know that T(x) is a sufficient statistics for θ, given x, if the pdf p(x; θ) can be factorized as In general, there might exist more than one sufficient statistic.What is important in our setting is to identify the minimal sufficient statistics.A statistic T(x) is a minimal sufficient statistic relative to p(x; θ) if it is a function of every other sufficient statistic.In words, a minimal sufficient statistic maximally compresses the information about the vector parameter θ in the observed samples.
If the parameter vector is itself a multivariate random vector θ, it is possible to compute the mutual information between θ and x, and the mutual information between θ and a sufficient statistic T(x).From information theory, it is well known that T(x) is a sufficient statistic for θ, if it holds I(θ; x) = I(θ; T(x)). ( This means that, if the goal of communication is estimating the parameter vector θ from the observation of a data-set X, there is no loss of information in sending T(X) instead of X.What is the advantage?The advantage is that the entropy of T(X) can be much smaller than the entropy of X.This means that the number of bits necessary to encode T(X) may be much smaller than the number of bits to be used to encode X.As a consequence, the number of bits to be transmitted can be significantly decreased, with no losses in terms of inference.This simple example can be generalized to highlight the idea that, if the goal of communication is to perform some inference on the data, it is not necessary to transmit the data as they are, but it is more convenient to transmit a function of the data, which depends on the goal of the inference.In this way, what is sent is only what is really relevant for the action to be performed at the receiver side.Using this strategy, the data can be significantly compacted while leaving unaltered the performance of the inference method.Let us consider, for example, a set of sensors measuring a number of physical parameters, with the goal of controlling a manufacturing process to detect, and possibly predict, some anomalous behavior.The observation is a data-set composed of M discrete-time signals x i (n), i = 1, . . ., M , n = 1, . . ., N , observed for N time instants.If these time series are modeled as Gaussian random processes, denoting by X the whole data set and by x(n) the column vector, of size M , collecting the measurements gathered by all the sensors at time index n, the minimal sufficient statistics are: The advantage in transmitting the two above statistics T 1 (X) and T 2 (X), instead of X, is that the entropy of T 1 (X) and T 2 (X), is much lower than the entropy of X.Hence, using a lossless encoder, the minimum number of bits necessary to send the sufficient statistics is much lower than the minimum number of bits necessary to send X.What is important to stress is that the reduction in the amount of data to be transmitted is achieved without affecting the accuracy of the final estimation.This advantage is achieved because the goal at the receiver is not to reconstruct the transmitted data, but to estimate the parameter vector θ.
The above example is a simple case used to show how the definition of a goal in the design of the communication strategy can help reducing the data rate used to transmit data from peripheral sensing nodes to edge servers, without harming the final inference capabilities.In general, the pdf of the data may not be available, so that it is not immediately clear how to build a sufficient statistic of the data.Nevertheless, a rather general setup is offered by deep neural networks (DNN), used to make inference on the observed data.In such a case, we can measure the mutual information between the input and each internal layer of the network.Experimental results reveal that the first layers of a DNN operate some kind of compression over the course of training [44].This reduction is driven by progressive geometric clustering.Building on such a behavior, we can think of splitting the DNN in a certain number of layers at the source and the remaining layers at the destination, so as to reduce the number of bits to be transmitted from source to destination, still maintaining roughly the same overall performance.
A use case that is gaining increasing attention in edge computing is the extraction of video analytics, possibly in real time.In such a case, the definition of a goal can help to reduce the data rate from peripheral video camera and the edge server where the video analytics are extracted.One possibility is, for example, to perform a preliminary filtering to remove all frames that are not relevant for the ensuing video analysis.As an example, FFS-VA is a pipelined system for multistage video analytics, based on three stages [45]: an initial stage used to remove the frames only containing a background; a stream-specialized network model used to identify target-object frames; a model to remove the frames whose target objects are fewer than a threshold.
The last example suggests that, if we merge semantic and goal-oriented communications, we may find methods to send only the semantic information that is strictly relevant to the achievement of the goal.In this way, we have a further way to reduce the amount of data to be sent, without affecting the accuracy/reliability in the goal accomplishment.

VI. SEMANTIC LEARNING IN
The grand vision beyond 6G is that, whereas all previous network generations have been designed by humans, the design of next generation networks will see a significant contribution from machines, driven by a pervasive introduction of artificial intelligence at the edge of the network, as close as possible to the end-users [46], [47].Bringing intelligence at the edge of the network meets a twofold request [48]: i) optimize the utilization of network resources by learning network-related parameters and predicting future events (machine learning for communication, computation, and caching); and ii) design the communication layer to enable the distributed implementation of machine learning algorithms operating under tight delay constraints (communication for machine learning).To support this view, 6G will have to be an AI-native network, meaning that the network will be designed to facilitate the introduction of learning tools that will reshape the network according to requirements and constraints [49].
In this paper, we build on this vision while stressing, at the same moment, that a true leap forward can be achieved by merging machine learning with semantic.As before, the interplay between learning and semantic will produce a twofold advantage: i) semantic communication, with its widespread use of knowledge representation systems, will facilitate the development of machine learning algorithms that exploit semantic features to improve their learning capabilities, facilitate disambiguation exploiting context information, and increase robustness against adversarial attacks; ii) machine learning will help semantic communication algorithms to better understand which is the relevant information, thus further improving effectiveness/efficiency.In the following sections, we will elaborate on the above ideas.
A. Orchestrating C 4 resources as parts of one system Introducing intelligence to enable new services, like for example intelligent manufacturing, autonomous driving, or virtual reality, to cite a few, requires taking smart decisions within tight delay constraints and to respect the jitter bound to enforce a deterministic chain decision process.To meet this demand, edge computing represents an emerging paradigm that pushes computing tasks and services from the (possibly distant) cloud to the edge of the network.An efficient design of edge computing should enable the end users, either humans or machines, to access computational and storage resources with very low service delays.
The service delay, i.e. the time elapsed between the instantiation of a request and its fulfillment, typically involves a communication delay between the involved parties, a computation delay, and possibly the time needed to access storage units containing relevant data.Furthermore, if the goal of communication is the control and actuation of delay-critical procedures, it is also necessary to include a further delay associated to control and actuation.As a consequence, imposing a service delay induces a coupling between communication, computation, caching, and control (C 4 ) resources.This coupling motivates a joint C 4 design, to achieve an effective resource orchestration [50].
In this C 4 context, computation offloading will play a key role in extending the capabilities of peripheral devices to take smart decisions, by offloading their computational requests to nearby mobile edge hosts (MEH).
However, to satisfy economy of scale issues, the network design must keep into account that edge resources, like computing and storage capabilities, will be necessarily limited, so that it is imperative to optimize their usage.For example, in computation offloading the service delay includes a communication delay and a computation delay.Hence, it makes sense to optimize the use of computation and communication resources jointly, as suggested for example in [51], in a static multi-user setup, where multiple small cells are served by a single edge computing host.A dynamic joint optimization algorithm that schedules the communication and computational resources optimally was suggested in [52], [53].A further extension to the case where, in each slot, the optimizer may not know exactly the state of the system and then of the objective function to be optimized, one may resort to online convex optimization (OCO) algorithms, as suggested in [54].Besides communication and computation, a further coupling in the C 4 framework is between communication and caching.In many applications, it is necessary to cache the desired contents on demand and, possibly, in a proactive way, to meet delay constraints.Proactive caching will be considered in Section VI-D.The additional novelty is that, in the C 4 context, caching will involve not only popular content, but also software needed to run user applications remotely, but as close as possible to the end-user.

B. Machine learning for wireless networks
Proactive mechanisms require the availability of prediction mechanisms and then, more generally, sophisticated machine learning strategies, possibly running at the edge of the network.In the last decade, in the field of supervised learning, deep neural networks (DNN) have been shown to provide performance even better than human capabilities, in some general purpose applications, like sound recognition or image classification [55].In particular, convolutional neural networks (CNN) are a key actor, for their intrinsic sparsification of the number of edges from one layer to the next, achieved exploiting the structure of convolutional operators, well suited for images and sound, i.e. signals residing over regular grids.More recently, the objective has been the extension of DNN architectures to work on data that do not reside over a regular grid, but rather on graphs, whose topology captures some of the intrinsic (pairwise) relations among the observed signals.Mixing graph-based input representation with learning is a problem addressed in [56].
When applying supervised learning methods to communication networks, there is the great opportunity to train the receiver exploiting all data packets that are correctly decoded and then offer examples of labeled data.In this case, the learner can exploit a huge number of labeled examples, even if there is no human assigning the labels.In supervised learning, there is typically a clear separation between the learning and the testing phase.However, in a wireless context where the channel is going to vary over time, it is more interesting to look for online learning mechanisms, where the learning and testing phases are intertwined and evolve over time.Among online algorithms, we may distinguish between reinforcement learning methods, where an agent learns by acting and observing the results of its action, but without assuming an a priori model of the observation, and stochastic optimization methods, where a dynamic procedure is enforced that progressively adapts its online actions, derived as an online optimization that, step-by-step, exploits whatever knowledge is available at the moment about the involved variables.
In general, when using supervised learning, the price for achieving a high performance level is the need of a huge number of labelled data.This means that, typically, in supervised learning, humans are still playing a fundamental role in providing the labelled examples.Conversely, in unsupervised learning, there are no labelled examples to use.In such a case, the goal of the learner is to find patterns in the data, like for examples clusters, and then classify the observation according to the features of the detected patterns.Graph-based representation methods are a fundamental tool for clustering, as evidenced in spectral clustering methods.However, graph-based approaches only capture pairwise relations among entities, like e.g., time series.However, in many applications, such as in biological networks for example, pairwise relations are not able to extract all the information.A further advancement towards the incorporation of multiway relations has been carried out in [57], introducing topological signal processing (TSP).Merging TSP with DNN has the potential to unravel important information from complex data sets.
Examples of application of machine learning tools to the physical layer of a communication system have been already studied, under unknown channels, using an autoencoder (AE) [58], [59], a recurrent neural network (RNN) [60], and a generative adversarial network (GAN) [61].Extensions to higher layers, including network slicing and orchestration are presented in [62], where the authors propose an AI framework for cross-slice admission and congestion control considers communication, computing, and storage resources simultaneously to maximize resources utilization and operator revenue.

C. Federated learning
The pervasive introduction of learning tools at the edge of the network poses a number of challenges.In a framework where multiple devices produce a wealth of data to be used to extract analytics, it is clear that some sort of collaborative learning can boost the performance of learning algorithms.Collaborative learning typically requires the exchange of data, but this approach raises critical issues because of privacy concerns to share data among users.A viable approach that has recently received a significant interest is federated learning [63], [64], while learning the model parameters is performed over remote devices, e.g.data centers, while the data are kept locally.In centralized federated learning, the devices do not send their data to any remote server.They only share local estimates with a central processing unit, either a data center or a MEC host.Under quite broad assumptions, each device can boost its performance also without exchanging data, thus preserving privacy.A possible problem formulation in federated learning is the following where f i (x i ; w) is the empirical loss function of device i, w is the (global) parameter vector to be learnt (e.g., a regressor or the wights of a DNN), x i are the data collected by device i, and p i , with p i ≥ 0 and N i=1 p i = 1, is a coefficient that weights the importance of the data collected by user i in the estimation of w.In the simplest setting, the weights p i can be chosen as p i = n i /( N i=1 n i ), where n i is the number of examples observed by the ith machine.In federated learning, an iterative procedure is implemented where, at each iteration, instead of sending the local data x i , each device sends its local estimate ŵi [n] (or a gradient of its local empirical loss with respect to the parameter vector) to a fusion center, which sends back an updating term that takes into account the information received by all cooperating nodes.Under quite broad conditions, this strategy converges to the global optimum [64].
The above setting is amenable for its simplicity, but it also faces a number of challenges, namely heterogeneity in communication channels, local devices' behaviors, and models.More specifically, in a practical setting, the communication channels between the local devices and the fusion center may vary significantly across devices, in terms of data rates, latency and blocking probability.This heterogeneity alters the updating rule at the fusion center and then it can impact the final accuracy and the convergence time.Similarly, some devices can be faulty or provide data with high delays that again impact the convergence time.Finally, there is a model heterogeneity implying that there is no single globally optimal estimate w fitting all local needs, but there are rather different devices, or groups of devices, for which there is a better estimate w k , which does not necessarily coincide with the best estimate of another group of devices.In such a case, a valid improvement is represented by multi-task federated learning [65].

D. Intelligent content delivery at the edge
To reduce the time elapsed between the instantiation of a request and the content delivery, dynamic caching is expected to play a key role.In edge-caching-based networks, a large amount of popular content can be pre-fetched and stored by the edge facilities, such as access points or mobile edge hosts, making a substantial portion of the data visible, ubiquitous and very close to the UEs.In particular, proactive caching policies, populating local storage disks based on estimated demand, is a key enabler.The optimization variables in dynamic caching are [66]: cache deployment, deciding where to deploy the caches; content caching, deciding which files are put in each cache; and content routing, deciding which paths will be employed to carry the right content to the right place.Clearly, the goodness of a proactive caching policy relies upon the accuracy of prediction algorithms that are able to incorporate the variability of content requests over space ad time.So far, caching policies have been fundamentally addressed to move contents throughout the network.In the semantic communication framework highlighted in this work, what is necessary to move is not only content but also knowledge base systems and virtual machines, able to run applications on demand, as close to the end user as possible.Migrating virtual machines is a topic that has received significant attention, but still a big leap forward is needed to reduce the migration times.This involves, for example, the use of light virtual machines, like containers, to reduce the amount of data to be migrated.In general, proactive caching of virtual machines poses an interesting and challenging problem in terms of computation and caching, especially for delay-sensitive applications.More generally, the distributed implementation of machine learning algorithms accessing distributed contents poses a number of challenges, like, e.g., namely communication bandwidth, straggler's (i.e., slow or failing nodes) delay, privacy and security bottlenecks.A new concept that can alleviate some of the above bottlenecks in large-scale distributed computing is coded computing, which utilizes coding theory to effectively inject and leverage data/computation redundancy [67].More specifically, a method called Coded Distributed Computing (CDC) has been recently proposed in [67], which injects redundant computations across the network in a structured manner.

E. Semantic machine learning
Machine learning is a data-driven approach that learns and uncovers patterns from examples.Learning occurs then by induction (bottom-up).Conversely, humans learn (abstract) models from experience and from the culture accumulated through time from human kind.They use then these abstract models to interpret what they perceive, to plan actions, maybe build further models and check their validity.Being purely inductive, machine learning tools are apparently unbiased.However, the patterns learned by machines sometimes are only brittle surface-level observational visual characteristics of the observed data.In object recognition, for example, typically the learner segments the image to better interpret it.However, sometimes it is precisely this act of segmenting that, taking things out of context, may induce big ambiguities.This is one of the reasons why current deep learning systems may sometimes be so brittle and easy to fool despite their uncanny power [68]: They search for correlations in data, rather than meaning, but meaning is much more than correlation.
Conversely, humans put their observations into context, thus connecting their observations to external domain knowledge and properly reasoning about what they are seeing, thus looking for meaning.
We believe that machine learning will make a significant leap forward when it will properly incorporate external world knowledge and context into its decision-making processes.6G networks can facilitate the merge between machine learning and knowledge base systems.Semantic communication will in fact push for the distribution of knowledge base system across the network, to enable semantic interpretation.Within this semantically-enriched context, new machine learning algorithms can benefit from the inclusion of knowledge representation and reasoning schemes.At the same time, semantic learning offers more capabilities to further optimize the use of network resources, focusing on semantic and goal-oriented aspects.

VII. WIRELESS ENVIRONMENT AS A SERVICE
Radical technological advances based on emerging Reconfigurable Intelligent Surfaces (RISs) [23] are offering today the opportunity to forge a new generation of dynamically programmable wireless propagation environments with minimal redesign and reconfiguration costs for the connect-compute network.RIS can act as transmitter, receiver or as an anomalous reflector, where the direction of the reflected wave is no longer specular according to natural reflection laws, but instead adaptively controllable.This will offer unprecedented opportunities to support locally, dynamic adaptation to stringent and highly varying 6G service requirements such as momentary link capacity, localization accuracy, energy efficiency, electromagnetic field emission and secrecy guarantee.RIS are artificial intelligent controlled surface constituted of hundreds or thousands of reconfigurable unit elements.RIS technology can be embedded in objects of the environment, such as walls, mirrors, ceilings, etc., and operates as a nearly passive tunable anomalous reflector or a transmitter/receiver when equipped with active radiofrequency elements.RISs operate today at low frequency but research is actively designing solution to support wideband operation up to the sub-THz spectrum.A RIS can be implemented using a variety of technologies and, through its property of modifying the radio wave propagation, can provide extraordinary benefits for diverse wireless goal oriented communications.Different antennas technologies can be adopted to design an RIS, along with reflect-arrays [69], transmit-arrays [70], [71] and, smart, programmable or software defined metasurfaces [72] [73].
Although RISs have great potentials to implement advanced electromagnetic wave manipulations, several fundamental and implementation problems are still unsolved.At the physical layer, only simple functionalities, such as electronic beamsteering and multi-beam scattering, have been demonstrated in the literature.In addition, problems such as channel state information estimation and acquisition, passive information transfer and transceiver design are still open.At the network layer, the propagation settings of installed RISs might be adapted depending on scenarios, application needs and on realtime/predicted network dynamics.As today, open challenges remain on how to define the network architecture incorporating multiple RIS and how to orchestrate the reconfiguration of multiple RIS devices in time-space, to meet specific suitable (goaloriented) deployment strategies for effectively exploiting RIS technology.Such RIS network adaptation capabilities should make possible to dynamically program the wireless propagation environment while meeting specific legislation and regulation requirements on spectrum use and Electro-Magnetic Field Emission, that might vary for specific locations and evolve over time.Finally, it is still an open question to check under what conditions a RIS empowered network can provide a significant reduction of the overall network energy consumption.

VIII. CONCLUSIONS
In this paper, we have proposed a new vision of 6G wireless networks, where semantic and goal-oriented communications are the key actors of a paradigm shift, with respect to the common Shannon's framework, that has the potential of bringing enormous benefits.Increased effectiveness and reliability can be attained without necessarily increasing bandwidth or energy, but actually identifying the relevant information, i.e. the information strictly necessary to enable the receiver to extract the intended meaning correctly or to actuate the right procedures for achieving a predefined goal.This approach capitalizes on the largely untapped capabilities of communication, computation and caching systems, on one side, and on knowledge representation tools on the other side, to distil the relevant information from the rest, selectively transmitting, processing, inferring and remembering only information relevant to goals defined by the interacting parties.
The new philosophy breaks the usual trend aimed to provide more and more resources, like, e.g., energy or bandwidth, to enable more and more sophisticated services.This breakdown is at the core of a vision that looks at sustainability as the key property of future networks.
The challenge brought by the new approach is the implementation of distributed computing mechanisms able to learn and extract meaning from data, exploiting proper knowledge representation systems, and to identify and exploit the strictly relevant information in goal-oriented communications.In this new framework, learning can greatly benefit from the introduction of semantic aspects, in order to pass from a purely induction strategy to the interplay of inductive and deductive mechanism, learning from examples, but also building abstract models guiding next learning and so, similarly to the way the human brain works.