Heavy-Tailed Distributions, Generalized Source Coding and Optimal Web Layout Design
- Creators
- Zhu, Xiaoyun
- Yu, Jie
-
Doyle, John
Abstract
The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal, understanding the characteristics of Internet traffic plays a more and more critical role. Empirical studies of measured traffic traces have led to the wide recognition of self-similarity in network traffic. Moreover, a direct link has been established between the self-similar nature of measured aggregate network traffic and the underlying heavy-tailed distributions of the Web traffic at the source level. This report provides a natural and plausible explanation for the origin of heavy tails in Web traffic by introducing a series of simplified models for optimal Web layout design with varying levels of realism and analytic tractability. The basic approach is to view the minimization of the average file download time as a generalization of standard source coding for data compression, but with the design of the Web layout rather than the codewords. The results, however, are quite different from standard source coding, as all assumptions produce power law distributions for a wide variety of user behavior models. In addition, a simulation model of more complex Web site layouts is proposed, with more detailed hyperlinks and user behavior. The throughput of a Web site can be maximized by taking advantage of information on user access patterns and rearranging (splitting or merging) files on the Web site accordingly, with a constraint on available resources. A heuristic optimization on random graphs is formulated, with user navigation modeled as Markov Chains. Simulations on different classes of graphs as well as more realistic models with simple geometries in individual Web pages all produce power law tails in the resulting size distributions of the files transferred from the Web sites. This again verifies our conjecture that heavy-tailed distributions result naturally from the tradeoff between the design objective and limited resources, and suggests a methodology for aiding in the design of high-throughput Web sites.
Files
Name | Size | Download all |
---|---|---|
md5:c71b7e7d2c2328bb8b88ede5aef62f62
|
3.4 MB | Download |
Additional details
- Eprint ID
- 28038
- Resolver ID
- CaltechCDSTR:2000.001
- Created
-
2006-07-16Created from EPrint's datestamp field
- Updated
-
2019-11-26Created from EPrint's last_modified field
- Caltech groups
- Control and Dynamical Systems Technical Reports