亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The rapid development of high-speed railways (HSRs) puts forward high requirements on the corresponding communication system. Millimeter wave (mmWave) can be a promising solution due to its wide bandwidth, narrow beams, and rich spectrum resources. However, with the large number of antenna elements employed, energy-efficient solutions at mmWave frequencies are in great demand. Based on a mmWave HSR communication system with multiple mobile relays (MRs) on top of the train, a dynamic power-control scheme for train-ground communications is proposed. The scheme follows the regular movement characteristics of high-speed trains and considers three phases of train movement: the train enters the cell, all MRs are covered in the cell, and the train leaves the cell. The transmit power is further refined according to the number of MRs in the cell and the distance between the train and the remote radio head. By minimizing energy consumption under the constraints of the transmitted data and transmit power budget, the transmit power is allocated to multiple MRs through the multiplier punitive function-based algorithm. Comprehensive simulation results, where the velocity estimation error is taken into account, are provided to demonstrate the effectiveness of the proposed scheme over several baseline schemes.

相關內容

The U.S. electrical grid has undergone substantial transformation with increased penetration of wind and solar -- forms of variable renewable energy (VRE). Despite the benefits of VRE for decarbonization, it has garnered some controversy for inducing unwanted effects in regional electricity markets. In this study, the role of VRE penetration is examined on the system electricity price and price volatility based on hourly, real-time, historical data from six Independent System Operators (ISOs) in the U.S. using quantile and skew t-distribution regressions. After correcting for temporal effects, we found an increase in VRE penetration is associated with decrease in system electricity price in all ISOs studied. The increase in VRE penetration is associated with decrease in temporal price volatility in five out of six ISOs studied. The relationships are non-linear. These results are consistent with the modern portfolio theory where diverse volatile assets may lead to more stable and less risky portfolios.

Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.

We consider a user-centric scalable cell-free massive MIMO network with a total of $LM$ distributed remote radio unit antennas serving $K$ user equipments (UEs). Many works in the current literature assume $LM\gg K$, enabling high UE data rates but also leading to a system not operating at its maximum performance in terms of sum throughput. We provide a new perspective on cell-free massive MIMO networks, investigating rate allocation and the UE density regime in which the network makes use of its full capability. The UE density $K$ approximately equal to $\frac{LM}{2}$ is the range in which the system reaches the largest sum throughput. In addition, there is a significant fraction of UEs with relatively low throughput, when serving $K>\frac{LM}{2}$ UEs simultaneously. We propose to reduce the number of active UEs per time slot, such that the system does not operate at ``full load'', and impose throughput fairness among all users via a scheduler designed to maximize a suitably defined concave componentwise non-decreasing network utility function. Our numerical simulations show that we can tune the system such that a desired distribution of the UE throughput, depending on the utility function, is achieved.

All-gather collective communication is one of the most important communication primitives in parallel and distributed computation, which plays an essential role in many HPC applications such as distributed Deep Learning (DL) with model and hybrid parallelism. To solve the communication bottleneck of All-gather, optical interconnection network can provide unprecedented high bandwidth and reliability for data transfer among the distributed nodes. However, most traditional All-gather algorithms are designed for electrical interconnection, which cannot fit well for optical interconnect systems, resulting in poor performance. This paper proposes an efficient scheme, called OpTree, for All-gather operation on optical interconnect systems. OpTree derives an optimal $m$-ary tree corresponding to the optimal number of communication stages, achieving minimum communication time. We further analyze and compare the communication steps of OpTree with existing All-gather algorithms. Theoretical results exhibit that OpTree requires much less number of communication steps than existing All-gather algorithms on optical interconnect systems. Simulation results show that OpTree can reduce communication time by 72.21%, 94.30%, and 88.58%, respectively, compared with three existing All-gather schemes, WRHT, Ring, and NE.

With the rapid development of high-speed railway systems and railway wireless communication, the application of ultra-wideband millimeter wave band is an inevitable trend. However, the millimeter wave channel has large propagation loss and is easy to be blocked. Moreover, there are many problems such as eavesdropping between the base station (BS) and the train. As an emerging technology, reconfigurable intelligent surface (RIS) can achieve the effect of passive beamforming by controlling the propagation of the incident electromagnetic wave in the desired direction.We propose a RIS-assisted scheduling scheme for scheduling interrupted transmission and improving quality of service (QoS).In the propsed scheme, an RIS is deployed between the BS and multiple mobile relays (MRs). By jointly optimizing the beamforming vector and the discrete phase shift of the RIS, the constructive interference between direct link signals and indirect link signals can be achieved, and the channel capacity of eavesdroppers is guaranteed to be within a controllable range. Finally, the purpose of maximizing the number of successfully scheduled tasks and satisfying their QoS requirements can be practically realized. Extensive simulations demonstrate that the proposed scheme has superior performance regarding the number of completed tasks and the system secrecy capacity over four baseline schemes in literature.

Inspired by several delay-bounded mission-critical applications, this paper investigates chase-combining-based hybrid automatic repeat request (CC-HARQ) protocols to achieve high reliability in delay-constrained applications. A salient feature of our approach is to use the end-to-end delay constraint for computing the total number of ARQs permitted in the network, and then optimally distributing them across the nodes so as to minimize packet-drop-probability (PDP), which is the end-to-end reliability metric of interest. Since the chase-combining strategy combines the received packets across multiple attempts, we observe that the PDP of the network depends on the coherence-time of the intermediate wireless channels. As a result, we address the question of computing optimal allocation of ARQs for CC-HARQ strategies under both slow-fading and fast-fading scenarios. For both the channel conditions, we derive closed-form expressions for the PDP, and then formulate several optimization problems for minimizing the PDP for a given delay-bound. Using extensive theoretical results on the local minima of the optimization problems, we synthesize low-complexity algorithms to obtain near-optimal ARQ distributions. Besides using extensive simulation results to validate our findings, a detailed end-to-end delay analysis is also presented to show that the proposed CC-HARQ strategies outperform already known Type-1 ARQ based strategies in several scenarios.

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plans or apply parallelism combinations within a very limited search space. In this approach, we propose Galvatron, a new system framework that incorporates multiple popular parallelism dimensions and automatically finds the most efficient hybrid parallelism strategy. To better explore such a rarely huge search space, we 1) involve a decision tree to make decomposition and pruning based on some reasonable intuitions, and then 2) design a dynamic programming search algorithm to generate the optimal plan. Evaluations on four representative Transformer workloads show that Galvatron could perform automatically distributed training with different GPU memory budgets. Among all evluated scenarios, Galvatron always achieves superior system throughput compared to previous work with limited parallelism.

The aim of this paper is to characterize the impact of non-orthogonal multiple access (NOMA) on the age of information (AoI) of grant-free transmission. In particular, a low-complexity form of NOMA, termed NOMA-assisted random access, is applied to grant-free transmission in order to illustrate the two benefits of NOMA for AoI reduction, namely increasing channel access and reducing user collisions. Closed-form analytical expressions for the AoI achieved by NOMA assisted grant-free transmission are obtained, and asymptotic studies are carried out to demonstrate that the use of the simplest form of NOMA is already sufficient to reduce the AoI of orthogonal multiple access (OMA) by more than 40%. In addition, the developed analytical expressions are also shown to be useful for optimizing the users' transmission attempt probabilities, which are key parameters for grant-free transmission.

Federated learning (FL) is a distributed model training paradigm that preserves clients' data privacy. It has gained tremendous attention from both academia and industry. FL hyper-parameters (e.g., the number of selected clients and the number of training passes) significantly affect the training overhead in terms of computation time, transmission time, computation load, and transmission load. However, the current practice of manually selecting FL hyper-parameters imposes a heavy burden on FL practitioners because applications have different training preferences. In this paper, we propose FedTune, an automatic FL hyper-parameter tuning algorithm tailored to applications' diverse system requirements in FL training. FedTune iteratively adjusts FL hyper-parameters during FL training and can be easily integrated into existing FL systems. Through extensive evaluations of FedTune for diverse applications and FL aggregation algorithms, we show that FedTune is lightweight and effective, achieving 8.48%-26.75% system overhead reduction compared to using fixed FL hyper-parameters. This paper assists FL practitioners in designing high-performance FL training solutions. The source code of FedTune is available at //github.com/DataSysTech/FedTune.

Privacy and security have rapidly emerged as priorities in system design. One powerful solution for providing both is privacy-preserving computation, where functions are computed directly on encrypted data and control can be provided over how data is used. Garbled circuits (GCs) are a PPC technology that provide both confidential computing and control over how data is used. The challenge is that they incur significant performance overheads compared to plaintext. This paper proposes a novel garbled circuit accelerator and compiler, named HAAC, to mitigate performance overheads and make privacy-preserving computation more practical. HAAC is a hardware-software co-design. GCs are exemplars of co-design as programs are completely known at compile time, i.e., all dependence, memory accesses, and control flow are fixed. The design philosophy of HAAC is to keep hardware simple and efficient, maximizing area devoted to our proposed custom execution units and other circuits essential for high performance (e.g., on-chip storage). The compiler can leverage its program understanding to realize hardware's performance potential by generating effective instruction schedules, data layouts, and orchestrating off-chip events. In taking this approach we can achieve ASIC performance/efficiency without sacrificing generality. Insights of our approach include how co-design enables expressing arbitrary GC programs as streams, which simplifies hardware and enables complete memory-compute decoupling, and the development of a scratchpad that captures data reuse by tracking program execution, eliminating the need for costly hardware managed caches and tagging logic. We evaluate HAAC with VIP-Bench and achieve a speedup of 608$\times$ in 4.3mm$^2$ of area.

北京阿比特科技有限公司