Changeset 2785

Show
Ignore:
Timestamp:
10/30/09 15:33:38 (4 weeks ago)
Author:
stevedh
Message:

@ 12

Location:
docs/Lowthane/ipsn10
Files:
3 modified

Legend:

Unmodified
Added
Removed
  • docs/Lowthane/ipsn10/design.tex

    r2784 r2785  
    9696%% packets is volatile and potentially inaccurate.  Instead, w 
    9797 
    98 When presented with a new link, {\lowthane} initially use a {\emph{spot}} link quality metric, {\it e.g.} a hardware LQI estimate, to evaluate the quality of the link.  For each link, {\lowthane} maintains a Confidence value, currently the number of transmissions that have used that link.  Once this value crosses a certain threshold, the link is considered {\emph{mature}}, and is then evaluated using the overall link cost metric, ETX or expected transmissions.We use link-layer acknowledgement frames on all unicast traffic, and base our link quality estimate on the ack reception rate: consequently, our link estimator strongly prefers bidirectional links.  To account for the temporal variability of links, {\lowthane} maintains both long-term and short-term link estimates.  Both estimators use the same metric (ETX) but with different time horizons so as to manage the agility/stability tradeoff. 
     98When presented with a new link, {\lowthane} initially use a {\emph{spot}} link quality metric, {\it e.g.} a hardware LQI estimate, to evaluate the quality of the link.  For each link, {\lowthane} maintains a Confidence value, currently the number of transmissions that have used that link.  Once this value crosses a certain threshold, the link is considered {\emph{mature}}, and is then evaluated using the overall link cost metric, ETX or expected transmissions.We use link-layer acknowledgement frames on all unicast traffic, and base our link quality estimate on the ack reception rate: consequently, our link estimator strongly prefers bidirectional links.  To account for the temporal variability of links, {\lowthane} maintains both long-term and short-term link estimates.  % Both estimators use the same metric (ETX) but with different time horizons to manage the agility/stability tradeoff. 
    9999%When presented with 
    100100%a new link, {\lowthane} initially uses a {\emph{spot}} link quality metric, 
     
    127127{\emph{Willingness}} value, which indicates the degree to which the 
    128128advertising node is willing to forward traffic for other nodes.  Such a metric 
    129 accounts for a heterogeneous network, in which battery-powered and mobile 
    130 nodes would rather shift forwarding responsibilities to mains-powered or stationary nodes. 
     129accounts for a heterogeneous network, and allow battery-powered and mobile 
     130nodes to shift forwarding responsibilities to mains-powered or stationary nodes. 
    131131 
    132132\subsubsection{Default Route Formation}\label{sssec:default-route} 
     
    173173 
    174174Topology reports are sent to the {\controller} periodically using the default 
    175 route; they are opportunistically piggybacked on data traffic whenever an 
    176 application generates sufficiently frequent traffic.  The unpacked datagram in 
     175route; they are opportunistically piggybacked on data traffic whenever 
     176application traffic is sufficiently frequent.  The unpacked datagram in 
    177177Figure \ref{fig:arch} shows how topology information is added to data traffic 
    178178using an optional extension header; using this mechanism, the overhead is only 
     
    201201allows state to be installed in the network to optimize active flows.  A 
    202202{\controller} uses Route Install messages to update a node's Flow Table.  When a node 
    203 receives a Route Install, it inserts or updates an entry in the Flow Table.  Each Route 
     203receives a Route Install, it creates or updates an entry in the Flow Table.  Each Route 
    204204Install has two parts: 
    205205 
    206 \begin{itemize} 
     206\begin{itemize}\addtolength{\itemsep}{-0.5\baselineskip} 
    207207\item{{\bf{Flow Match:}} The criteria used to determine whether a 
    208208  given packet matches a flow table entry.  By default {\lowthane} uses 
     
    306306data, combined with their policy configuration.  Policy rules must currently 
    307307be hard-coded into the route computation engine which runs Dijkstra's 
    308 shortest-path over the link state.  We envision in the future adding a policy 
    309 specification language similar to the packet classification languages used by 
    310 {\tt packetfilter} or {\tt iptables}. 
     308shortest-path over the link state.  % We envision in the future adding a policy 
     309% specification language similar to the packet classification languages used by 
     310% {\tt packetfilter} or {\tt iptables}. 
    311311 
    312312\subsection{Multiple {\Controller}s} 
     
    363363their contents have been discussed in previous sections.  First, The Default 
    364364Route Table is used to maintain a stable back-channel to a {\controller}; its 
    365 consistency is maintained by using a standard tree-formation protocol.%   When 
     365consistency is maintained by using the default route formation algorithm.%   When 
    366366% inconsistencies are detected, broadcasts are started to resolve them. 
    367367 
     
    371371by listening to updates for same amount of time as the slowest topology 
    372372update.  Due to link and node failures, the LSDB may contain false information. 
    373 If allowed to persist, this can be extremely damaging since nodes may be 
    374 instructed to continuously attempt to send messages across a non-existent 
    375 link, wasting energy and increasing latency.  If a link fails on the default 
     373% If allowed to persist, this can be extremely damaging since nodes may be 
     374% instructed to continuously attempt to send messages across a non-existent 
     375% link, wasting energy and increasing latency.   
     376If a link fails on the default 
    376377route, local repair mechanisms reroute the packet around the failure, and 
    377378update the Default Route Table.  If a link fails otherwise, the packet must 
     
    379380route.  In this case, the packet will be re-routed to a {\controller} using the 
    380381default route DAG.  The {\controller} notes the broken link in the source 
    381 header, and after a small number of failures, it deletes the link from the 
    382 LSDB so it will route around the failure.  The link may reappear in 
    383 subsequent topology reports, if the failure was temporary. 
     382header, and after a small number of failures, deletes the link from the 
     383LSDB so it will route around the failure.%   The link may reappear in 
     384% subsequent topology reports, if the failure was temporary. 
    384385 
    385386The final table is the Route Install Table: it contains routes installed 
    386387based on the Link State Database of a {\controller} at some point in time.  It 
    387 is also not proactively maintained: when a link which is part of 
    388 an installed route fails, the packet is forwarded to a \controller.  In 
     388is also not proactively maintained: when an installed route fails,  
     389the packet is forwarded to a \controller.  In 
    389390addition to removing the link from the LSDB after a small number of failures, 
    390391the {\controller} will generate a ``Route Uninstall'' message so that the node 
    391 router does not continue to use the same erroneous route. 
     392router does not continue to use the broken route. 
    392393 
    393394 
  • docs/Lowthane/ipsn10/discussion.tex

    r2716 r2785  
    3636corresponding to the multicast group.  It would then unicast the message to a 
    3737small number of nodes within each connected component (for reliability); these 
    38 nodes would then run a dissemination algorithm like Trickle, where only group 
    39 members participate.  This is essentially an implementation of of Levis's 
     38nodes would then run a dissemination algorithm like Trickle 
     39This is essentially an implementation of of Levis's 
    4040Firecracker protocol \cite{firecracker}. 
    4141 
     
    6666\subsection{Limitations}\label{sec:limitations} 
    6767 
    68 \lowthane~has some important limitations which impact its suitability for some 
    69 applications. 
     68% \lowthane~has some important limitations which impact its suitability for some 
     69% applications. 
    7070 
    7171\subsubsection{Stretch} 
  • docs/Lowthane/ipsn10/evaluation.tex

    r2784 r2785  
    1313months in a real deployment.  On two experimental testbeds, we 
    1414evaluate {\lowthane}'s scalability, performance, and resilience across a 
    15 range of workloads and failure conditions. 
     15range of workloads and failures. 
    1616 
    1717\subsection{Questions} 
     
    366366%% L2Ns.   
    367367 
    368 \subsubsection{Basic Operation} 
    369  
    370 To see {\lowthane}'s basic mechanisms in action, we first focus on a trace of 
    371 50 packets from a single flow running on testbed A.  The measure of 
    372 interest is ``transmissions per success,'' which counts the total number of link 
    373 layer transmissions necessary to deliver a packet from a source to a destination. 
    374 % This flow was taken from an experiement on 
    375 % Testbed A, with 4 other concurrent flows, and with the Flow Table being empty 
    376 % at the beginning of the experiment.  %% This allows us to verify the basic 
    377 %% functionality of {\lowthane}'s centralized route install. 
    378  
    379 \begin{figure} [htb] 
    380 \centering 
    381 \includegraphics[width=0.7\columnwidth]{figs/BlowByBlow.pdf} 
    382 \caption{Number of transmissions for each packet of a single flow.  The first 
    383   packet triggers a Route Install message, which reduces the path length from 
    384   6 to 2.} 
    385 \label{fig:blowbyblow} 
    386 \end{figure} 
    387  
    388 Figure~\ref{fig:blowbyblow} shows {\lowthane}'s installation mechanism 
    389 in action.  The first packet is forced to traverse the Default Route 
    390 through a \controller, a four-hop path (plus two retransmissions, for 
    391 a total of six transmissions).  However, this first packet prompts the 
    392 installation of a shorter and more direct two-hop path which is used 
    393 for all subsequent packets, reducing the stretch by a factor of two. 
    394 We also see the effect of link failures: delivering the first packet 
    395 to the {\controller} required two retransmissions for a total of six 
    396 transmissions, and subsequent link failures occasionally require a 
    397 retransmission. 
     368% \subsubsection{Basic Operation} 
     369 
     370% To see {\lowthane}'s basic mechanisms in action, we first focus on a trace of 
     371% 50 packets from a single flow running on testbed A.  The measure of 
     372% interest is ``transmissions per success,'' which counts the total number of link 
     373% layer transmissions necessary to deliver a packet from a source to a destination. 
     374% % This flow was taken from an experiement on 
     375% % Testbed A, with 4 other concurrent flows, and with the Flow Table being empty 
     376% % at the beginning of the experiment.  %% This allows us to verify the basic 
     377% %% functionality of {\lowthane}'s centralized route install. 
     378 
     379% \begin{figure} [htb] 
     380% \centering 
     381% \includegraphics[width=0.65\columnwidth]{figs/BlowByBlow.pdf} 
     382% \caption{Number of transmissions for each packet of a single flow.  The first 
     383%   packet triggers a Route Install message, which reduces the path length from 
     384%   6 to 2.} 
     385% \label{fig:blowbyblow} 
     386% \end{figure} 
     387 
     388% Figure~\ref{fig:blowbyblow} shows {\lowthane}'s installation mechanism 
     389% in action.  The first packet is forced to traverse the Default Route 
     390% through a \controller, a four-hop path (plus two retransmissions, for 
     391% a total of six transmissions).  However, this first packet prompts the 
     392% installation of a shorter and more direct two-hop path which is used 
     393% for all subsequent packets, reducing the stretch by a factor of two. 
     394% We also see the effect of link failures: delivering the first packet 
     395% to the {\controller} required two retransmissions for a total of six 
     396% transmissions, and subsequent link failures occasionally require a 
     397% retransmission. 
    398398 
    399399%% shows the number of transmissions needed to 
     
    418418\subsubsection{Concurrent Flows} 
    419419 
    420 Expanding our test from a single flow, we test {\lowthane}'s ability 
     420First, we test {\lowthane}'s ability 
    421421to support a relatively large number of concurrent low bandwidth flows.  Concurrency stresses 
    422422many aspects of a routing protocol, especially {\it reliability} and {\it 
     
    440440\subfigure[\lowthane, with (centralized) and without (triangle) the 
    441441route-installation optimization] { 
    442   \includegraphics[width=0.7\columnwidth]{figs/Control-double.pdf} 
     442  \includegraphics[width=0.65\columnwidth]{figs/Control-double.pdf} 
    443443  \label{fig:concurrent-smote-pdr} 
    444444} 
    445445\subfigure[AODV performance] { 
    446   \includegraphics[width=0.7\columnwidth]{figs/TinyAODV.pdf} 
     446  \includegraphics[width=0.65\columnwidth]{figs/TinyAODV.pdf} 
    447447  \label{fig:concurrent-aodv} 
    448448} 
     
    494494  variability without the centralized optimization}.  We also note that {\it 
    495495  installing routes reduces 
    496   stretch}, as Figure \ref{fig:concurrency-smote-trans} illustrates.  Testbed A 
    497 has a diameter of only 3 or 4, and so the consistent reduction in stretch by 
    498 one hop is quite significant in relation to the network diameter. 
     496  stretch}, as Figure \ref{fig:concurrency-smote-trans} illustrates.  % Testbed A 
     497% has a diameter of only 3 or 4, and so the consistent reduction in stretch by 
     498% one hop is quite significant in relation to the network diameter. 
    499499 
    500500\subsubsection{AODV Comparison} 
     
    549549\begin{figure*}[htbp] 
    550550\centering 
    551 \setlength\SUBSIZE{0.55\columnwidth} 
     551\setlength\SUBSIZE{0.45\columnwidth} 
    552552% good noise 
    553553\subfigure[Concurrent load]{ 
     
    584584 
    585585\subsubsection{Failure Reliability} 
    586 Centralized systems are often criticized for being too slow to detect and 
    587 react to remote events.  It is important that the optimization we introduced 
     586% Centralized systems are often criticized for being too slow to detect and 
     587% react to remote events.   
     588It is important that the optimization we introduced 
    588589does not break in the case of node and link failures.  Therefore, we next 
    589590turn our attention to {\lowthane}'s robustness to  
     
    602603\begin{figure} [h] 
    603604\centering 
    604 \includegraphics[width=0.85\columnwidth]{figs/Single_Failure_Control-6.pdf} 
     605\includegraphics[width=0.65\columnwidth]{figs/Single_Failure_Control-6.pdf} 
    605606\caption{Packet delivery ratio and control traffic statistics for a single 
    606607  flow amidst node failures: each vertical line indicates that four nodes 
     
    649650\begin{figure} [h] 
    650651\centering 
    651 \includegraphics[width=0.85\columnwidth]{figs/Failure_Control-6.pdf} 
     652\includegraphics[width=0.65\columnwidth]{figs/Failure_Control-6.pdf} 
    652653\caption{Packet delivery ratio and control traffic statistics for multiple 
    653654  flows amidst node failures.  Approximately half the nodes are involved in a 
     
    664665\ref{fig:mfailure-smote-pdr} shows the same view of the data: at a high 
    665666level, it appears that {\lowthane} still responds well to the network 
    666 failures.  We see an interesting spike in control traffic around 1000 seconds 
    667 into the experiment: at this point we have removed approximately $1/3$ of the 
    668 network, and the DAG forms a brief loop, which persists for several minutes 
    669 before the DAG re-forms and success rates return to near 
    670 100\%.  We can also note that the transmission stretch necessary to deliver a 
     667failures.   
     668At around 1000 seconds into the experiment and we have removed $1/3$ of the 
     669network, traffic spikes as a loop is formed and then resolved by the DAG 
     670maintenance protocol. 
     671% We see an interesting spike in control traffic around 1000 seconds 
     672% into the experiment: at this point we have removed approximately $1/3$ of the 
     673% network, and the DAG forms a brief loop, which persists for several minutes 
     674% before the DAG re-forms and success rates return to near 
     675% 100\%.   
     676The transmission stretch necessary to deliver a 
    671677packet oscillates throughout the experiment (Figure 
    672678\ref{fig:mfailure-smote-trans}) as packets must be re-routed across longer paths. 
     
    674680is in the process of recovering when the final nodes are killed, partitioning 
    675681the network and making recovery impossible. 
    676  
    677682Furthermore, Figures \ref{fig:sfailure-smote-pdr} and 
    678683\ref{fig:mfailure-smote-pdr} demonstrate that the DAG control traffic remains 
     
    686691in the face of substantial failure.  One design feature that we did 
    687692not originally anticipate but which proved critical was {\it 
    688   explicitly uninstalling Flow Table state}.  While many protocols use 
    689 timeouts to expire soft state, there is no ``natural period'' at which 
    690 one can say this table state is stale; instead, it makes sense to 
    691 allow it to persist until it is needed.  If we discover it is 
    692 inaccurate, the packet will still generally be delivered using the 
     693  explicitly uninstalling Flow Table state}.  % While many protocols use 
     694% timeouts to expire soft state, there is no ``natural period'' at which 
     695% one can say this table state is stale; instead, it makes sense to 
     696% allow it to persist until it is needed.   
     697If we discover (through a re-routed packet) that an installed path is broken,  
     698the packet will still generally be delivered using the 
    693699default route, but failing to uninstall or replace the route leads to high 
    694700transmission counts as the source repeatedly tries to use a broken