Reconfigurable Intelligent Optical BackPlane for Parallel Computing and Communications

Ted H. Szymanski
Harvard Scott Hinton, Utah State University
Analysis of a microchannel interconnect based on the clustering of smart-pixel-device windows

D. R. Rolston, B. Robertson, H. S. Hinton, and D. V. Plant

A design analysis of a telecentric microchannel relay system developed for use with a smart-pixel-based photonic backplane is presented. The interconnect uses a clustered-window geometry in which optoelectronic device windows are grouped together about the axis of each microchannel. A Gaussian-beam propagation model is used to analyze the trade-off between window size, window density, transistor count per smart pixel, and lenslet f-number for three cases of window clustering. The results of this analysis show that, with this approach, a window density of 4000 windows/cm² is obtained for a window size of 30 µm and a device plane separation of 25 mm. In addition, an optical power model is developed to determine the nominal power requirements of a 32 × 32 smart-pixel array as a function of window size. The power requirements are obtained assuming a complementary metal-oxide semiconductor inverter–amplifier and dual-rail multiple-quantum-well self-electro-optic-effect devices as the receiver stage of the smart pixel.

1. Introduction

Future digital systems, such as asynchronous transfer mode switching networks and massively parallel processing machines, will have large printed-circuit-board-to-printed-circuit-board connectivity requirements in order to support the large aggregate throughput demands being placed on these systems.1 Current electronic interconnect technology may not be capable of supporting both the connection densities and the bandwidth required in these systems because of connector limitations at the PCB-to-backplane interface.2 Two-dimensional, free-space optical interconnects represent a potential solution to the needs of these high-speed, connection-intensive digital systems. Faster clock rates associated with CPU's and memory3 and increased parallelism in computer architectures are two features of future systems that may benefit from the attributes of free-space optics. When implemented at the PCB-to-PCB level in the form of a photonic backplane, for example, this technology can potentially provide greater connectivity at higher data rates than current or future electrical backplanes.4

Numerous optical schemes that offer improvements to the metal-based interconnect have been proposed and demonstrated. A board-to-board optical interconnect built by Sakano et al.5 is a prototype interconnect that uses bulk optics and light-emitting diodes to connect 64 processors in a three-dimensional mesh. A board-to-board interconnect that uses diffractive optics and light-emitting-diode arrays, built by Dhoedt et al.6 is an example applied to massively Parallel Processing MPP machines. In another system by Redmond and Schenfeld,7 vertical-cavity surface-emitting-lasers and microlens arrays were used to interconnect cache memory and processors. Vertical-cavity surface-emitting lasers and metal-semiconductor-metal detectors used by Plant et al. were used to demonstrate a board-to-board interconnection in an electronic backplane chassis.8 Finally, work on free-space photonic-switching networks that used field-effect transistor self-electro-optic-effect devices (FET-SEED's) and complementary metal-oxide silicon (CMOS/SEED's) by McCormick et al.9 and Krishnamoorthy et al.,10 respectively, have demonstrated the connection-intensive capability of free-space optics.

In this paper we present an analysis of a novel microchannel-based interconnection scheme developed for photonic-backplane applications. Figure 1(a) illustrates the concept of a photonic backplane that employs a free-space interconnect scheme, mak-
ing use of previously demonstrated techniques of through-chip interconnects. In principle, this approach could be extended to through-board interconnects. The alignability of this setup will undoubtedly determine the practicality of this approach, but for simplicity this subject is not fully addressed here.

Previous study in this area has typically concentrated on interconnects in which a single optical beam is transmitted along each microchannel. The system presented here uses an alternative telecentric lenslet arrangement that allows multiple signal beams to be relayed via each microchannel, as shown in Fig. 1(b). The system’s size and physical layout are in keeping with the size and the geometry of standard electrical backplanes. The most critical parameter of this analysis is the channel or window density. This is a measure of the number of point-to-point connections made with regard to the physical space limitations and will have a significant impact on the amount of data throughput a single switching node or parallel machine can have.

The interconnect model presented is designed to operate with smart-pixel arrays in which each smart pixel is capable of electrical-to-optical and optical-to-electrical conversion of digital data. Thus, as shown in Fig. 1(b), each smart pixel contains a cluster of optical input–output windows capable of transmitting and receiving data via a single microchannel. In addition to optical-to-electrical and electrical-to-optical conversions, the smart pixel can perform high-speed processing operations at the backplane level, such as address recognition or packet routing.

An estimate of the aggregate throughput of such a system can be established with reference to Fig. 1(a). Consider several boards interconnected with 10 opto-electronic chips per PCB, in which each chip contains a 32 × 32 smart-pixel array (1024 smart pixels). If each pixel operates at 100 megabits per second (Mbps), the optical interconnect (or photonic backplane) would support greater than a terabit per second of aggregate data throughput.

In this paper we investigate the optical limitations of this clustered-window interconnection geometry and the effect that it has on smart-pixel design. The first part of the paper describes a Gaussian-beam propagation model that was used to analyze the dependence of connection density and transistor count versus window size. A number of physical design constraints based on lenslet f-number ($f/#$) and wiring layout restrictions are also discussed. The model is then used to analyze three different window-cluster geometries in order to determine an optimum range of interconnect parameters. In the final section, an estimate of the optical power required for a specific receiver design and interconnect layout is given. For the purposes of this work, multiple-quantum-well (MQW) symmetric SEED (S-SEED) receivers and modulators were assumed; however, this interconnection geometry may be used with any optoelectronic device. A circuit model of the optical receiver is used to determine the required optical power as a function of optoelectronic device size. This verifies that the operating region, suggested by the optical analysis, remains valid with respect to the optical power required.

2. Interconnection Model

A simple Gaussian-beam propagation model was used to analyze the performance limits of the clustered-window geometry. In considering the optical layout, several assumptions pertaining to the optical interconnect geometry were made in order to limit the number of variable design parameters and provide a tractable solution. For simplicity, it was assumed that the optical interconnect consisted of a single 4-f telecentric optical relay. The window cluster was defined as a regular $M \times N$ array of optical windows positioned symmetrically about the optical axis of the microchannel. Each optical window (the active region of an optoelectronic device on the surface of a chip) was square with dimensions of $d_w \times d_w$, and the separation between optical windows, $d_s$, was constant across the cluster. The cluster dimensions were given by $l_h \times l_v$ (Fig. 2).
beam at the window and the lenslet used in this analysis are
\[
\omega_b = \omega_0 \left[ 1 + \left( \frac{\lambda f}{\pi \omega_0^2} \right)^{1/2} \right],
\]
\[
d_w = 2r_0 = 3\omega_0,
\]
\[
2r_b = 3\omega_b,
\]
from which it can be shown that
\[
r_b = \frac{d_w}{2} \left[ 1 + 9\lambda f / \pi d_w^2 \right]^{1/2},
\]
where \(r_b\) is the effective beam radius at the lenslet and \(r_0\) is the effective beam radius at the window.

In the case of an \(M \times N\) cluster that has the dimensions of
\[
l_h = M d_w + (M - 1)d_s,
\]
\[
l_v = N d_w + (N - 1)d_s,
\]
the longer side of the cluster will govern the dimension of the square lenslet:
\[
l = \max[l_h, l_v].
\]
The lenslet size will therefore be
\[
D_L = l - d_w + 2r_b.
\]

When Eqs. (1)–(7) are substituted into Eq. (8), an equation for the lenslet size \(D_L\) in terms of window dimensions \(d_w\), window separation \(d_s\), and size of window array \(M \times N\) can be derived:
\[
D_L = f(M, N, d_w, d_s).
\]

Based on the assumption that routing trace lines out of a cluster on a chip will prevent windows from being extremely tightly packed, the parameter \(d_s\) was related to \(M, N\), and the fabrication restrictions associated with the chip technology. Although some optoelectronic integration techniques, such as solder-bump bonding,\(^{16}\) allow optoelectronics to be placed directly above silicon circuitry, thereby slightly altering trace-line routing conditions, the assumption made here was that the cluster would be too densely packed to allow logic to be placed between windows and that trace lines would be the only features present within the cluster.

The separation between windows \(d_s\) was assumed to be dependent on only five variables: the size of the cluster \((M \times N)\), the trace-line width \(w\), the trace-line separation \(s\), and the number of metal layers on the chip. The trace-line width and separation were chosen to be 4.2 and 3.2 \(\mu\text{m}\), respectively, exactly double the minimum feature size of metal 3 for the 0.8-\(\mu\text{m}\) CMOSX Vendor Rules.\(^{17}\) Initially we considered the case in which the cluster was square \((M = N)\), in which every window consisted of a two-
terminal device and only one type of metal trace line on the chip was permitted. The trace lines could then be routed to only two of the four sides of the cluster, thereby allowing the window separation in one direction to be close to zero and the window separation in the other direction [Fig. 3(a)] to be given by

\[ d_s^* = Mw + (M + 1)s. \]  \hspace{1cm} (9)

To produce a more general equation for window separation, two metal layers were assumed available; in practice, this would be a more likely case. With this assumption, the terminals of the devices could be routed out on all four sides of the cluster, and Eq. (9) could be divided by two [Fig. 3(b)], giving a window separation in both directions of

\[ d_s = (d_s^*/2) = [Mw + (M + 1)s]/2. \]  \hspace{1cm} (10)

This description does not exclude other routing strategies, but does provide a quantitative method for obtaining a window separation that depends on chip layout. Under these conditions, the window separation cannot be considered as an independent variable.

One of the more critical parameters that must be taken into account when determining the connectivity and scalability of a free-space system is the window density. In this paper the term window refers to the active region of any optoelectronic device without specifying whether it modulates, emits, or detects light. In this way, any optical bit-encoding technique (dual rail, single rail, etc.) and any optical fan-in or fan-out can be derived from the more general window density. The window density was defined as the number of optical windows per unit cross-sectional area of interconnect; based on this definition, the window density was given by

\[ W_{Den} = MN/D_{L^2}. \]  \hspace{1cm} (11)

The \( f/\# \) is an important parameter that governs the performance of the optical system and is explored here. In the analysis below we define an effective lenslet \( f/\# \):

\[ f/\# = \frac{f}{\text{Diameter}}, \]  \hspace{1cm} (12)

where the diameter is given by the farthest possible distance between the far edges of two beams passing through a lenslet [using the \( 3\omega \) beam waists defined above]. The diameter is then given by [Fig. 4]

\[ \text{Diameter} = 2r_b + [l_h - d_{\omega}^2 + l_v - d_{\omega}^2]^{1/2}. \]  \hspace{1cm} (13)

---

Fig. 3. (a) Routing out of a cluster with one metal trace line, (b) Routing out of a cluster with two metal trace lines (conceptual drawing).

Fig. 4. Diameters given by the far edges of two beams to calculate the effective \( f/\# \).
This definition differs from the physical f/# of the lenslet, \( f/\sqrt{2D_L} \), but is more useful as it refers to the region of the lenslet through which the light actually passes.

To obtain the maximum allowable number of transistors per smart pixel as a function of the interconnect geometry, the size of the lenslet was assumed to govern the space available for the processing electronics associated with each cluster. Thus the area of the lenslet defined a footprint for the underlying electronics. Assuming that a typical chip layout is highly regular and that the smart-pixel logical cell is replicated across the chip, the lenslet will mark the smallest uniquely specified area that can be associated with any one cluster.

The number of transistors is then a function of the area of the lenslet less the area of the cluster. Depending on the type of technology used,\(^{16}\) it was assumed that this would be a region where trace lines would be most densely packed and hence void of transistors. It follows that the number of transistors is given by

\[
T_{x_{\text{per cluster}}} = |D_L^2 - l_{w}^2/4|T_{x_{\text{Density}}},
\]

where \( T_{x_{\text{Density}}} \) is the average number of transistors per unit area and is technology dependent. The transistor density is highly dependent on the chip architecture: the Digital Equipment Corporation’s DEC-Alpha 21064 processor that uses a 0.68 \( \mu \)m CMOS has a transistor density of \(~900,000\) transistors/cm\(^2\),\(^{18}\) whereas high-speed switching chips that employ CMOS LSI may have smaller densities of \(~100,000\) transistors/cm\(^2\),\(^{19}\) which is due to the less regular arrangement of logic and registers. For this analysis, a transistor density of 100,000 transistors/cm\(^2\) is assumed.

The equations derived thus far establish a set of four parameters from which an initial design of an optical system and an initial electronic chip architecture can be developed. Given the optical interconnection model described above, the independent variables associated with this interconnect design were window size, cluster size, and, to a certain extent, chip and lenslet technology. In Section 3 a set of boundary conditions for these independent variables is described.

### 3. Model Results

For the purposes of this analysis, we assumed that a smart pixel required four windows, of which two windows served as dual-rail encoded input and two windows served as dual-rail encoded output. Note that these assumptions were arbitrary, but were required in order to define a practical system and place useful boundaries on the design.

Also note that a \( 32 \times 32 \) smart-pixel array (1024 channels) that has four windows per smart pixel will contain a total of 4096 windows. If we assume that we are restricted to a 1 cm \( \times \) 1 cm chip because of packaging considerations, a window density of at least 4096 windows/cm\(^2\) would be required. This would satisfy at least one of the criteria for the above-mentioned terabit backplane.

Three cases of window clustering were analyzed with the equations derived in the Section 2. Case 1 was a \( 2 \times 1 \) cluster representing 1 bit of optical data or half a smart pixel, case 2 was a \( 2 \times 2 \) cluster representing 2 bits of optical data or a single smart pixel, and case 3 was a \( 4 \times 4 \) cluster representing 8 bits of optical data or four smart pixels (Fig. 5). These cluster sizes were chosen to be compatible with the base-2 electronic addressing and data word size of almost all electronic architectures. A square physical geometry was also assumed for the cluster because of the symmetry in both the x and the y axes of the microchannel. Note, however, that any cluster geometry could be used for the cluster, and these cases were chosen only to illustrate the properties of the model.

Figure 6 shows a plot of the lenslet size versus window size and provides a means of relating the size of the chip to the device windows. For example, a very small window would require a large lenslet and as a consequence would require a large chip area. If case 3 is considered, and a \( 32 \times 32 \) smart-pixel array is used, a \( 16 \times 16 \) lenslet array would be required. Assuming the 1 cm \( \times \) 1 cm chip dimensions described above for the maximum chip size, the lenslet would be 625 \( \mu \)m, and from Fig. 6, a minimum window size of \(~30\) \( \mu \)m would be required. Note also that cases 1 and 2 give identical results because of the application of Eq. (7).

From Fig. 6 it can be seen that the lenslet size reaches a minimum for a given window size; beyond this window size the lenslet size begins to increase. Physically this is the point where the beam waist at window equals the beam diameter on the other side of the lenslet. As the window size increases beyond this point, the beam becomes effectively collimated on the windows and focused between the lenslets. Because this analysis was constrained to operate with point sources on the windows, the window size at which this maximum occurred, \( d_{v_{\text{MAX}}} \), was used to define an upper limit on the window size. This parameter was found by the solution of

\[
\frac{d|D_L|d_{v_{\text{MAX}}}}{d_{v_{\text{MAX}}}} = 0,
\]

where \( D_L \) is given by Eq. (8). This parameter provides the maximum window dimensions as a function of cluster geometry (Fig. 7) and illustrates the trade-off between the cluster size and the window size.

Figure 8 illustrates how the f/# changes with window size. This result indicates that the f/# of the lenslet is very low for very small window sizes. Although it is advantageous to have small device windows in order to minimize device capacitance,
because of fabrication limits it may not be possible to produce a low-$f/#$ diffractive lenslet with high optical efficiency.

To investigate the effect of fabrication limits on low-$f/#$ lenslet designs, we performed an efficiency analysis of the two-broad interconnect shown in Fig. 9.20 The optical efficiency of the optical components used in this analysis are given in Table 1. The system described in Fig. 9 outlines the relevant features of a modulator-based optical interconnect. It is the optical efficiency of this interconnect that is used in Section 4 to obtain a maximum optical power required for a $32 \times 32$ smart-pixel array.

In this analysis it was assumed that the interconnect used multilevel diffractive lenslet arrays designed with a standard analytic-quantization technique.21 These components are relatively easy to produce as square, contiguous lenslets in large arrays by the use of standard photolithographic techniques.

The minimum feature size of a multilevel diffractive lenslet, $T_{\text{min}}$, is governed by the number of phase
levels used in the design and the specified $f/#$. Thus, in order for a particular lenslet to be realizable, $T_{\text{min}}$ must be greater than the minimum feature size that can be fabricated lithographically, $\delta$. Equations (16)–(18) relate these variables to the achievable diffraction efficiency $\eta$ of a multilevel lenslet designed with the analytic-quantization method:

$$T_{\text{min}} = \lambda \left[ 1 + |2f/#|^2 \right]^{1/2}, \quad (16)$$

$$\Delta = T_{\text{min}}/\delta, \quad (17)$$

$$\eta = \left[ \frac{\Delta}{\pi \sin(\pi \Delta)} \right]^2. \quad (18)$$

When this fabrication technique is applied to the lenslet arrays in the optical interconnect described above, a plot of the normalized incident optical power on a receiver as a function of window size is given, assuming a minimum feature size of $\delta = 0.5$ µm (Fig. 10). From this result it can be seen that the efficiency of the interconnect decreases rapidly below a window size of $\sim 10$ µm. It was for this reason that the lower bound on the $f/#$ was chosen to be 2.5 to ensure that as much light as possible will reach the receiver. This provided yet another lower bound on window size. Because for small window sizes each of the three cases considered converged toward the same point and because lenslet fabrication above $\sim f/2.5$ was assumed reasonable in terms of efficiency, all the cases were independent of the $f/#$ restrictions for window sizes greater than 10 µm.

The transistor count as a function of window size (Fig. 11) provided a general trend to smart-pixel complexity. A density of 100,000 transistors/cm$^2$ was chosen, as discussed above, and no transistors were permitted under the cluster. From this graph it can be seen that it is more desirable to have smaller windows if more complex smart-pixel architectures are required. An upper bound for a window size of $\sim 50$ µm is assumed if at least 250 transistors are required per smart pixel.

The window-density figure (Fig. 12) provided the largest insight into the optimum clustered-window geometry. Because high window densities are necessary to satisfy connection-intensive systems within certain physical sizes, there exists a definite advantage to larger clustering. Case 3 has a significant advantage in terms of window density over the other two cases, and more importantly, a density of at least 4096 windows/cm$^2$ can be achieved with moderately small windows of $\sim 30$ µm.

One of the problems that will affect the performance of free-space microchannel relays is the sensitivity of these systems to misalignment. Even small
translational and rotational errors can drastically reduce the interconnect efficiency and increase the level of optical cross talk between neighboring optical communication channels.

For simplicity, it is assumed that the lenslet arrays are accurately prealigned with respect to the smart-pixel arrays. The effect of a translational or a rotational misalignment can therefore be determined, to a first approximation, by ignoring diffraction effects and assuming that all the light incident upon a particular lenslet facet is directed into the corresponding device window. To illustrate this, let us consider the simple case of a single Gaussian beam of $1/e^2$ radius, $o_b$, incident upon a square lenslet facet of dimensions $D_L \times D_L$, as shown in Fig. 13(a). If the beam is misaligned by a distance $(\Delta x, \Delta y)$, the amount of light coupled into the device window, defined as the coupling efficiency $CE$, is given by

$$CE(\Delta x, \Delta y) = \text{erf} \left( \frac{-D_L/2 - \Delta x}{\kappa} \right) \times \text{erf} \left( \frac{-D_L/2 - \Delta y}{\kappa} \right),$$

where $\kappa = o_b/\sqrt{2}$ and $\text{erf}(z_1, z_2)$ is the generalized error function,

$$\text{erf}(z_1, z_2) = \frac{2}{\sqrt{\pi}} \int_{z_1}^{z_2} \exp(-t^2)dt. \quad (20)$$

In the case of a clustered-window geometry, the situation will be more complex. The effect of the offset of a particular device window with respect to the microchannel optical axis must also be included. As a result, the window at the extreme corner of a cluster will be the most sensitive to any misalignment. For a $4 \times 4$ window cluster [Fig. 13(b)], the dependence of the coupling efficiency on misalignment for the far corner window is given by

$$CE(\Delta x, \Delta y) = \text{erf} \left( \frac{-D_L/2 - s_x - \Delta x}{\kappa} \right) \times \text{erf} \left( \frac{-D_L/2 - s_y - \Delta y}{\kappa} \right),$$

where $s_x$ and $s_y$ are the offsets and are equal to $3/2(d_s + d_u)$. 

Table 1. Nominal Efficiencies for the Optical Components used in the Two-Board Interconnect of Fig. 9

<table>
<thead>
<tr>
<th>Number in Fig. 9</th>
<th>Optical Element</th>
<th>Efficiency</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Fiber</td>
<td>—</td>
</tr>
<tr>
<td>2</td>
<td>Collimating lens</td>
<td>0.94</td>
</tr>
<tr>
<td>3</td>
<td>Binary-phase grating</td>
<td>0.679</td>
</tr>
<tr>
<td>4</td>
<td>Risley beam steerer</td>
<td>0.92</td>
</tr>
<tr>
<td>5</td>
<td>Focusing lens</td>
<td>0.94</td>
</tr>
<tr>
<td>6</td>
<td>Lenslet array</td>
<td>$f/#$ dependent</td>
</tr>
<tr>
<td>7</td>
<td>Polarized beam splitter</td>
<td>0.964</td>
</tr>
<tr>
<td>8</td>
<td>Quarter-wave plate</td>
<td>0.94</td>
</tr>
<tr>
<td>9</td>
<td>Lenslet array</td>
<td>$f/#$ dependent</td>
</tr>
<tr>
<td>10</td>
<td>Transmitter plane</td>
<td>0.3/0.6</td>
</tr>
<tr>
<td>11</td>
<td>Lenslet array</td>
<td>$f/#$ dependent</td>
</tr>
<tr>
<td>12</td>
<td>Receiver plane</td>
<td>—</td>
</tr>
</tbody>
</table>

Fig. 10. Optical power delivered through the optical system for one receiver, for which the efficiency of the lenslet is dependent on the $f/#$. A normalized input power of 1 W is used.

Fig. 11. Number of transistors per lenslet as a function of window size and cluster size.

Fig. 12. Window density as a function of window size and cluster size.
The above analysis may be used to compare the alignment tolerances of a clustered window design with those of a single window per lenslet geometry. As an example, consider a microchannel relay designed to interconnect 4096 windows within a 1 cm × 1 cm area with a single window per lenslet (corresponding to a 32 × 32 smart-pixel array, as discussed above). This would require a lenslet dimension of 156.25 µm × 156.25 µm, with a restriction on the beam radius at the lenslet factor of \( \theta_b = D_L \). Note also that these misalignment tolerances are not specific to either a telecentric relay or a non-telecentric relay. The dependence of coupling efficiency on translational misalignment \( \Delta x \) for a relay based on a single lenslet per window geometry is shown in Fig. 14.

The relationship between the coupling efficiency and \( \Delta x \) for a \( 4 \times 4 \) clustered-window geometry has also been explored [Fig. 13b]. The window density of \( W_{\text{Den}} = 4096 \) cm\(^2\) was again used and required a lenslet length of \( D_L = 625 \) µm. This had an associated window size of \( d_w = 31.75 \) µm, a window separation of \( d_s = 16.4 \) µm, and a beam waist at the lenslet of \( \theta_b = 160 \) µm [Fig. 2]. These parameters were calculated with the interconnect model described in Section 2. From this result it can be seen that the \( 4 \times 4 \) clustered-window arrangement is significantly less sensitive to translational misalignments than the single lenslet per window geometry (Fig. 14).

The above model was also used to estimate the comparative sensitivity of a clustered-window configuration and equivalent single lenslet per device window geometry to rotational misalignments. Again, a 32 × 32 array of smart pixels, each containing four optical input–output windows, and a window density of 4096 cm\(^2\) were assumed. The dependence of the coupling efficiency on \( \phi \), the rotational misalignment about the exact center of the smart-pixel array, is shown in Fig. 15 for both cases. From this result it can be seen that the \( 4 \times 4 \) clustered-window geometry is approximately four times less sensitive to rotational misalignments, assuming that a 90% CE is required.

From these two calculations, it is possible to estimate the translational and the rotational alignment tolerances required for implementing a PCB-to-PCB data link based on a \( 4 \times 4 \) clustered-window geometry. Assuming that the CE must be kept above 0.9 to minimize the effect of optical cross talk, a rotational misalignment of less than 1.65° and a translational misalignment of less than 138 µm must be obtained. Although a challenge, this is
within the capabilities of current optomechanical technology. A full diffractive-based analysis of the alignment tolerances of a clustered-window relay will be the subject of a later publication.

4. Optical Power Model

The parameters developed in the sections above provided optical characteristics and numbers of transistors for a preliminary system design, but still lacked an estimate of the optical power required. In this section, a simple optoelectronic receiver was theoretically analyzed and simulated for different window sizes and bit rates to estimate the largest window permissible with present-day optical power limitations.

The design of an optical system is highly dependent on the performance of the optoelectronic receiver electronics. The circuit used in the analysis below is based on a MQW S-SEED pair. The transmitter modulator pair was chosen to be a directly addressable amplified differential modulator with a contrast ratio of 2 to 1 or 60% to 30% reflectivity, depending on applied bias voltage. The modulation of the beams had two states, one state encoded a logical 1 with high–low beams and the other state encoded a logical 0 with low–high beams.

Two major constraints affected the design of the receiver. The first was the speed with which it could change state and is a function of optical power and device impedance. The second was the receiver sensitivity, a parameter that was dependent on the threshold of the amplifier. Thereceiver used in this analysis was a CMOS SEED design that was chosen because of its simplicity and the relative ease with which the circuit can be modeled. The receiver was a simple open-loop amplifier; it used an independently adjustable diode clamp totem pole, a S-SEED biased at $\Delta V = 7.2$ V, and a minimum feature size 0.8-µm CMOS inverter–amplifier as the front end of the optical receiver.

The initial analysis considered the amplifier without any diode clamps at all; the diode clamps are easily introduced into the model at the end of the derivation and provide an increased bit rate. For this circuit, the impedance seen at node $V_x$ is almost purely capacitive. The capacitance consisted of the gates of the transistors and the capacitances of the MQW's. The P-i-N photodiodes have an assumed responsivity of 0.5 A/W.

The lumped parameter model of this circuit is shown in Fig. 17. The total capacitance at the node $V_x$ is given by

$$C_T = C_{M1} + C_{M2} + C_{gn} + C_{gp},$$

where $C_{M1}$ and $C_{M2}$ are the MQW capacitances and $C_{gp}$ and $C_{gn}$ are the transistor capacitances. The total current into node $V_x$ that is due to the photocurrent from the MQW diodes is given by

$$\Delta I_{\text{MAX}} = I_{p1} - I_{p2}.$$ 

Because this is a first-order circuit that could be modeled by a current source in parallel with a capacitance, a constant called the slew rate of the voltage at the node $V_x$ can be determined:

$$\text{Slew} = \Delta I_{\text{MAX}}/C_T,$$

where the voltage at the node is linear in time:

$$\Delta V_x = |\text{Slew}|\Delta t.$$ 

The equations above provided the basis for an expression relating the total optical power required by the system to the window size. To introduce window size as a parameter, the size of the capacitance was related to the window dimensions by a nominal sheet capacitance for the MQW of 0.1 fF/µm²:

$$C_{M1} = (0.1 \times 10^{-15})d_{w1}$$

$$C_{M2} = (0.1 \times 10^{-15})d_{w2}.$$ 

The gate capacitance of the transistors was fixed at 0.1 pF each. It was assumed that the photocurrent developed in the top and the bottom MQW diodes of the transmitter dual rail was in a high–low state and

![Fig. 16. CMOS SEED receiver circuit.](image-url)

![Fig. 17. Lumped parameter model of the CMOS SEED receiver circuit.](image-url)
was given by

\[ I_{p1} = (0.5)0.6(\eta_{\text{eff}})P_{\text{IN}}, \]

\[ I_{p2} = (0.5)0.3(\eta_{\text{eff}})P_{\text{IN}}, \]

where \( \eta_{\text{eff}} \) is the efficiency of the entire optical path (including the lenslet arrays) and is related to the efficiency model offered in Section 3 (Fig. 10) and \( P_{\text{IN}} \) is the input optical power associated with a single dual-rail optical link.

Because the maximum peak-to-peak voltage swing \( \Delta V_{\text{swing}} \) is fixed (because of clamping or the MQW’s themselves) and the bit rate in bits per second \( \text{bps} \) is related to \( \Delta t \), i.e.,

\[ \Delta t = 1/\text{bps}, \]

an equation that relates input optical power to bit rate, window size, and voltage swing was derived with Eqs. (22)–(30):

\[ P_{\text{IN}} = \frac{1}{0.15\eta_{\text{eff}}} \left[ 0.2 \times 10^{-15} d_{\text{w}}^2 + 0.2 \times 10^{-12} \right] \times \text{bps} / \Delta V_{\text{swing}}; \]

the total power required for a cluster would then be, assuming that half the windows are receivers,

\[ P_{\text{TOTAL}} = \frac{M N}{2} P_{\text{IN}}. \]

Equation (32) provides the optical power required by an \( M \times N \) array as a function of window size, given a fixed bit rate and voltage swing. The extension to this power equation to incorporate diode clamps involved confining the maximum swing at the node to the value of the voltage clamping the diodes; this is provided that the clamping voltage is confined from 0 V to just above turn-on of the diodes (nominally 0.6 V). Otherwise, the node \( V_x \) will not vary significantly from zero, and the amplifier–inverter would not switch logical states.

5. Optical Power Results

To validate the lumped parameter model of the receiver circuit discussed in the Section 4, a SPICE simulation of the circuit was performed and the results were compared with the equation for slew rate. The slew rate equation (24) was used to plot the voltage \( V_x \) with an incident optical pulse train as input and a clamping voltage of 0.6 V; this assumed that the clamping diode nodes, \( V_{cl} \) and \( V_{at} \) were set to zero (Fig. 16).

When the theoretical output [Fig. 18(a)] was compared with the SPICE output [Fig. 18(b)] for the node \( V_x \), the results were similar. However, because the assumptions made for gate capacitance \( C_{\text{gp}} \) and \( C_{\text{gp}} \) in the lumped parameter model were estimates, the theoretical slew rate differed slightly from the simulated slew rate. The theoretical slew rate was

\[ 238 \times 10^9 \text{V/s} \]

and the SPICE model showed \( \sim 386 \times 10^9 \text{V/s} \). This difference simply implies that the gate capacitances assumed for the lumped parameter model were slightly overestimated. Note also that the corresponding simulated output voltage of the receiver had a 5-V swing compatible with the digital electronics of the smart pixel, but, for simplicity, it is not shown here.

Based on the consistent results obtained above for the node voltage \( V_x \), the subsequent equations for optical power versus window size were plotted with a high degree of confidence in the validity. The optical power versus window size [Eqs. (31) and (32)] and the total optical power versus window size were plotted for two bit rates: 51 and 155 Mbps (Fig. 19). The total optical power was calculated assuming that 4096 windows are required for a \( 32 \times 32 \) smart-pixel array, as discussed above. The optical power versus window size was plotted as a semilog graph over a large range of optical powers. Again, a small window size is favorable; however, a window size of up to 50 µm is still feasible if a total optical input power below 1 W is assumed. A 1-W boundary would mark the maximum output power realistically available from present-day diode lasers and fiber delivery systems.
6. Discussion

In this analysis, a Gaussian-beam propagation model was used to formulate the basic parameters of a clustered-window microchannel relay. The model incorporated rigid assumptions, such as the 25-mm device plane separation and a 3σ beam waist to minimize clipping. When several less stringent assumptions were used, it was possible to reduce the number of independent design variables to two: the window size and the cluster size. The variable parameters were the lenslet size, the window density, the f/#, and the transistor count. It was assumed that a preliminary design of an optical interconnect could be done in which one or more of these parameters was chosen to be optimized.

To provide a meaningful set of data for the above parameters, several present-day technological limitations and definitions were imposed. Issues concerning the level of sophistication of the chip and the definition of a modulator-based smart-pixel array were introduced. The ability to produce low-f/# diffractive microlenses with high efficiencies was also considered. These imposed design space boundaries were used to provide meaningful ranges for the basic parameters. In addition, a photonic backplane was used as an example application of the model in order to illustrate how these basic parameters were to be interpreted.

This first-order analysis did not consider a complex optical path. No analysis was done for the beam propagating through multiple elements, except for the efficiency degradation of an example interconnect, and diffractive effects and clipping were not considered. In addition, optomechanical tolerancing was not incorporated into the model. However, a justification for the telecentric clustered approach in terms of translational and rotational misalignment was offered. This indicates that these tolerances were at least three times greater than those of a single window per microchannel and that these tolerances were feasible. The misalignment analysis offered here was used as an argument for telecentric clustered interconnects, but did not conclude that this interconnect is less sensitive to misalignment in all cases.

The optical power model was introduced to show that the range of window sizes being explored in this analysis corresponded to acceptable levels of input optical power. It was based on a simple CMOS receiver design and showed that a window range of up to 80 μm for a 32 × 32 smart-pixel array required ~1 W of optical power under ideal operating conditions. Assuming a more sensitive receiver, the results obtained with this receiver model can be viewed as worst case in terms of minimum optical power required. It was also assumed in the development of the optical receiver model that the system would be perfectly aligned, and any electrical or optical cross talk was ignored. These nonidealities would uniformly increase the amount of optical power required over the entire range of window size; hence, a smaller window would be more appropriate (Fig. 19). This would correspond to a large clustering in order to satisfy some nominal value of window density [Fig. 12] and thereby lead to additional support for this type of optical interconnect: large clusters with small windows.

7. Conclusion

With a microchannel interconnect model, an optimal range for window size and cluster size with reference to a telecentric microlens interconnect was determined. The model indicated several parameters that could be optimized when applied to a photonic backplane. Specifically, the model investigated the effect of increasing window size for three cases of cluster size. It was applied to four primary parameters: the lenslet size, the f/#, the transistor count, and the window density.

The analysis resulted in the lenslet-size parameter providing an upper bound of ~30 μm on the window size. This was necessary so that a 32 × 32 smart-pixel array could be implemented within a 1 cm × 1 cm chip area. The derivative of lenslet size indicated that a maximum window size of 70 μm was attainable for a 4 × 4 cluster, for which the maximum window size decreased with increasing cluster size. The f/# parameter showed that very low-f/# lenslets are required for small windows. This, coupled with the degradation in efficiency of low-f/# diffractive lenslets, provided a lower bound on the window size of ~10 μm for any clustering case.

The transistor-count parameter provided the smart-pixel designer with a maximum number of transistors associated with one smart pixel, assuming a LSI-type chip architecture. Alternatively, it indicated that an upper bound on window size existed if a minimum number of transistors was required per smart pixel. The minimum number was chosen to be 250 transistors per smart pixel, which implies a maximum window size of ~60 μm for any clustering geometry.

Finally, the most valuable parameter presented in this analysis was the window density. A goal of
1024 channels/cm² (4096 windows/cm²) was imposed in order to realize a terabit of aggregate data throughput, as discussed in the introduction. The window-density parameter showed that the number of connections as a function of window size increased dramatically with only a relatively small amount of window clustering. It also showed that a window density of 4000 windows/cm² could be obtained with a 4 × 4 cluster of moderately small windows of ~30 µm.

The optical power model was based on a simple CMOS inverter–amplifier and an MQW S-SEED for the receiver. It also used a simplified modulator-based two-board interconnect for the optical path. This provided a necessary feasibility check in terms of the total optical power required for a given bit rate. The optical power model showed that within the range of window size explored, the optical power required for the system was feasible in terms of the limits of present-day laser technology.

Using these first-order design parameters, a system designer can choose which of these parameters is more important and begin a rigorous optimization around that parameter. If all parameter are deemed important, a range of window sizes can be determined. With regard to the photonic backplane, the window size had lower bounds of 10 and 30 µm that were due to the f/# and the window density, respectively. It had upper bounds of 30, 60, and 70 µm for lenslet size, transistor count, and maximum window size of a 4 × 4 cluster, respectively. With this analysis, a first-order design of the system can be accomplished. This could then lead to a rigorous analysis of the optical interconnect.

Finally, this paper showed that, in principle, a clustered-window approach per microchannel in a free-space optical interconnect has increased connection density and may offer a potential solution for many connection-intensive computing systems.

This work was supported by the Canadian Institute for Telecommunications Research and the BNRNT/NSERC Chair in Photonics Systems. In addition, D. V. Plant acknowledges support from the Natural Sciences and Engineering Research Council (Canada) (OPG1555159), Fonds pour la Formation de Chercheur et l’Aide à la Recherche (Quebec) (NC-1415), and the McGill University Graduate Faculty.

References


17. MOSIS CMOSX 0.8 micron CMOS Vendor Rules, Version 1.4 for Hewlett-Packard n-well fabrication process (MOSIS, Marina del Rey, Calif., 1994).


