Let X∈RN×dX\in\Bbb{R}^{N\times d} be a real matrix. We see XX as a matrix of NN dd-dimensional data; i.e., each row represents a datum in Rd\Bbb{R}^d. This matrix is shown below:

X=(x11x12…x1dx21x22…x2d⋮⋮⋱⋮xN1xN2…xNd)

X

=\begin{pmatrix}

x_{11} & x_{12} & \ldots & x_{1d} \\

x_{21} & x_{22} & \ldots & x_{2d} \\

\vdots & \vdots & \ddots & \vdots \\

x_{N1} & x_{N2} & \ldots & x_{Nd}

\end{pmatrix}

We would like to scale the above data in a given range (a,b)(a,b). One could just divide each row by its norm, but this way the “structure” of the above data would be lost somehow. Instead one could do the scaling as follows:

ˆX=(ˆx11ˆx12…ˆx1dˆx21ˆx22…ˆx2d⋮⋮⋱⋮ˆxN1ˆxN2…ˆxNd),

\hat{X}

=\begin{pmatrix}

\hat{x}_{11} & \hat{x}_{12} & \ldots & \hat{x}_{1d} \\

\hat{x}_{21} & \hat{x}_{22} & \ldots & \hat{x}_{2d} \\

\vdots & \vdots & \ddots & \vdots \\

\hat{x}_{N1} & \hat{x}_{N2} & \ldots & \hat{x}_{Nd}

\end{pmatrix},

where

ˆxij=a+ˆxij−minjmaxj−minj(b−a).

\hat{x}_{ij}=a+\frac{\hat{x}_{ij}-\min_j}{\max_j-\min_j}(b-a).

Here, minj\min_j, maxj\max_j denote the minimum and maximum element of the jj-th column of XX, respectively.

Now, let’s assume that, along with the data XX, we also have a matrix Σ∈RN×d\Sigma\in\Bbb{R}^{N\times d},

Σ=(σ211σ212…σ21dσ221σ222…σ22d⋮⋮⋱⋮σ2N1σ2N2…σ2Nd).

\Sigma

=\begin{pmatrix}

\sigma^2_{11} & \sigma^2_{12} & \ldots & \sigma^2_{1d} \\

\sigma^2_{21} & \sigma^2_{22} & \ldots & \sigma^2_{2d} \\

\vdots & \vdots & \ddots & \vdots \\

\sigma^2_{N1} & \sigma^2_{N2} & \ldots & \sigma^2_{Nd}

\end{pmatrix}.

That is, the (i,j)(i,j)-th element of Σ\Sigma, σ2ij\sigma^2_{ij}, is a variance corresponding to the (i,j)(i,j)-th element of XX, xijx_{ij}. It may help considering each xijx_{ij} as a mean value of random variable with variance σ2ij\sigma^2_{ij}.

The question is, given the scaling proposed above for changing the range of xijx_{ij}’s in (a,b)(a,b), what would be an appropriate/meaningful scaling of elements in Σ\Sigma?

In any case, if the scaling method for XX does not seem meaningful to you, what would you propose for scaling xx’s in (a,b)(a,b) and then σ2\sigma^2’s accordingly?

=================

=================

=================