Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (2024)

\FAILED\FAILED

Shuo Han Yongshun Xu Dayang Wang Bahareh Morovati Li Zhou Jonathan S. Maltz \IEEEmembershipSenior Member, IEEE Ge Wang \IEEEmembershipFellow, IEEE and Hengyong Yu \IEEEmembershipFellow, IEEEThe paper was submitted on XXXXX. This work was supported in part by NIH/NIBIB under grants R01EB032807 and R01EB034737, and NIH/NCI under grant R21CA264772.S. Han, Y.S. Xu, D.Y. Wang, B. Morovati, L. Zhou, and H.Y. Yu are with the Department of Electrical and Computer Engineering, University of Massachusetts Lowell, Lowell, MA, 01854, USA.J.S. Maltz is with the Molecular Imaging and Computed Tomography, GE Healthcare, Waukesha, WI, 53188, USA.G. Wang is with the Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, 21180, USA.H.Y. Yu serves as the corresponding author (email: hengyong-yu@ieee.org).

Abstract

Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this paper, we propose a novel physics-informed score-based diffusion model (PSDM) for limited-angle reconstruction of cardiac CT. At the sampling time, we combine a data prior from a diffusion model and a model prior obtained via an iterative algorithm and Fourier fusion to further enhance the image quality. Specifically, our approach integrates the primal-dual hybrid gradient (PDHG) algorithm with score-based diffusion models, thereby enabling us to reconstruct high-quality cardiac CT images from limited-angle data. The numerical simulations and real data experiments confirm the effectiveness of our proposed approach.

{IEEEkeywords}

Computed tomography (CT), limited-angle CT reconstruction, diffusion model, iterative reconstruction, deep learning

1 Introduction

\IEEEPARstart

Cardiac Computed tomography (CT) is a powerful and widely deployed tool for non-invasive imaging of the beating heart and surrounding structures[1]. Specifically, electrocardiogram (ECG)-gated cardiac CT provides three-dimensional (3D) high-resolution images, enabling detailed examination of cardiac anatomy and function[2, 3].The evolution of CT scanners toward faster rotational times and higher tube powers is largely driven by the need to freeze the motion of the beating heart. The evolution of CT scanners toward faster rotational times and higher tube powers is largely driven by the need to freeze the motion of the beating heart. Motion artifact may affect the diagnosis of cardiovascular disease (CVD), which is the leading cause of death globally [4]. The main cardiac vessels are the right coronary artery (RCA), supplying the right atrium and ventricle, the left anterior descending artery (LAD), supplying the anterior and apical left ventricle, and the left circumflex artery (LCX), supplying the lateral and posterior walls of the left ventricle. In addition, there are risks associated with the radiation dose applied during conventional CT imaging techniques, prompting the exploration of alternative approaches to reduce radiation exposure [3]–[5].

Limited-angle CT (LACT) imaging can potentially address these two issues by limiting the scan time, and hence the motion-sensitive time window, and patient radiation exposure. Dose reduction benefits vulnerable patient groups and patients who undergo multiple scans[5].

However, the limited-angle acquisitions provide incomplete spatial frequency coverage, making it difficult to accurately reconstruct the object[6]. The reconstructed images often contain streaking artifacts and distortions, particularly along directions for which data are missing. Existing strategies to enhance limited-angle CT image reconstruction quality typically utilize iterative reconstruction (IR) algorithms. These techniques optimize an objective function, which usually integrates a precise system model and an image-domain-based prior. One popularly used image prior is total variation (TV) minimization [7] and its variants [8, 9]. Although these IR algorithms produce images with substantially improved image quality, IR may result in the loss of fine details, and residual limited-angle-associated artifacts. In the past few years, deep learning reconstruction (DLR) has demonstrated potential to replace and supplement IR-based methods [10, 11, 12, 13]. DLR methodologies are widely applied in the image domain, the projection domain, and occasionally, in both. However, these methods usually require paired projection or image data which is challenging to acquire to enable supervised learning. This constraint may hinder the practicality and generalizability of DLR in certain applications.

Recently, generative models have been at the forefront of advances in computer vision. The landscape of generative models can be generally divided into two categories: likelihood-based models and implicit generative models. The former category directly approximates the data distribution. A representative approach is the use of variational auto-encoders (VAEs) [14]. The aim is to produce an output virtually identical to the input for a given input (e.g., an image). VAEs can generate highly diversity samples and easy to train. However, the fidelity of the generated samples typically remains suboptimal because the encoder predicts the distribution of the latent code and it has a pixel-based loss. In contrast, implicit generative models (e.g., generative adversarial networks (GANs) [15]) adopt an indirect approach to fit the data distribution. These models are trained to output instances that, upon discrimination, fall within the target distribution. This strategy can generate high-fidelity samples, but the diversity of samples is poor because adversarial loss does not incentivize covering the entire data distribution. Notwithstanding their profound contributions to the field, implicit generative models are often criticized for their complicated training requirements, owing primarily to their dependence on adversarial learning[16, 17]. This complexity often leads to instability and potential model collapse, posing challenges to their broad utility in real-world applications. Despite these shortcomings, both likelihood-based and implicit generative models have demonstrated great utility for many tasks in imaging.

The central problem in machine learning involves modeling complicated datasets using highly flexible probabilistic distribution families. It is challenging to ensure that learning, sampling, inference, and evaluation within these distribution families can be handled analytically or computationally. Inspired by the principles of non-equilibrium thermodynamics[18], diffusion models (DMs), as an alternative to the existing generative models, emerged as a potential solution for modeling the data. They demonstrate a unique and appealing approach to generative modeling[19, 20, 21]. The basic idea of diffusion models is to systematically and gradually degrade the structures in the data distribution through an iterative forward diffusion process[19]. This degradation is then counteracted by a reverse diffusion process that restores the structure within the data, yielding a highly flexible and easily manageable data generative model. Diffusion models have demonstrated significant advancements in imaging tasks, for example: super-resolution[22], image in-painting[23], magnetic resonance image reconstruction[24], and CT reconstruction[25][26]. Notably, the conventional score-based diffusion models operate within the realm of unsupervised learning, offering potential advantages such as the reduction of data requirements and enhanced flexibility.

Nevertheless, in the special field of tomographic image reconstruction, there is an underlying desire for the ability to manipulate the final outcome to not only include image denoising and artifact removal but also to ensure the maintenance of image quality, while enhancing imaging metrics such as contrast resolution. Our results suggest that the proposed method can achieve high-quality reconstructions from limited-angle data. It appears to address the issue of false structure generation, which could enhance the reliability and clinical applicability of the reconstructed images. Primal-dual hybrid gradient (PDHG) is often more robust to noise and other perturbations than ADMM[27]. This can be beneficial in limited-angle CT reconstruction, where the data is often noisy and under-sampled.

In the pursuit of high-quality image reconstruction for limited-angle CT imaging, we introduce a novel approach we term a physics-informed score-based diffusion model (PSDM). This approach combines a model-based optimization strategy with the sampling steps of the score-based model, allowing simultaneous utilization of both data-based (as in DLR) and model-based (as in IR) priors. First, the score-based models are employed to generate an initial image, leveraging a score. Subsequently, we combine the initial image and the limited-angle CT image in the Fourier domain to leverage the complementary frequency domain characteristics. Finally, another data consistency step is implemented via the PDHG optimization. This step is instrumental in eliminating the discrepancy between the initial and target images. Furthermore, isotropic TV regularization is utilized to enhance the refinement of the final image.

2 Background

2.1 CT imaging model

CT acquisition is primarily affected by two kinds of noise. The first type is electronic noise, which originates from the detection system and adheres to a Gaussian distribution, being independent of object structure. The second type is photon statistical noise from inherent quantum fluctuations during X-ray emission, and it usually conforms to a Poisson distribution, the mean of which, for a particular ray, is dependent on the x-ray source beam characteristics and the structure of the imaged object. In the projection domain, the noise exhibits a complicated statistical distribution that blends both Gaussian and Poisson noise:

\boldsymbol{n}\thicksim\mathcal{N}(0,\sigma^{2})+\mathcal{P}(\xi(\boldsymbol{x%})),

(1)

where $\boldsymbol{n}$ is the pre-log noise variable, $\mathcal{N}(0,\sigma^{2})$ denotes the Gaussian distribution with variance $\sigma^{2}$ , and $\mathcal{P}(\xi(\boldsymbol{x}))$ is Poisson distribution in which $\xi(\boldsymbol{x})$ denotes the latent mapping from image structure $\boldsymbol{x}$ to the distribution expectation.In a general model of CT image reconstruction $\boldsymbol{x}$ must fulfill:

\boldsymbol{y}=\boldsymbol{A}\boldsymbol{x}+\boldsymbol{\tilde{n}},

(2)

where $\boldsymbol{y}\in\mathbb{R}^{M}$ is a vectorized sinogram, $\boldsymbol{x}\in\mathbb{R}^{N}$ is a vectorized CT image, $\boldsymbol{\tilde{n}}\in\mathbb{R}^{M}$ is the post-log noise corresponding to the pre-log noise $\boldsymbol{n}$ , $\boldsymbol{A}\in\mathbb{R}^{M\times N}$ is a CT system matrix, and $M$ and $N$ are respectively the total numbers of measurements in the sinogram and pixels in the image domain. Because the limited-angle CT reconstruction problem is ill-posed, a standard approach to estimate the unknown image $\boldsymbol{x}$ is to employ a regularization term:

\min_{x}\frac{1}{2}\|\boldsymbol{A}\boldsymbol{x}-\boldsymbol{y}\|_{2}^{2}+%\lambda R(\boldsymbol{x}).

(3)

The first term of Eq.(3) is a data fidelity term based on the $L_{2}$ -norm. The second term, $R(\boldsymbol{x})$ , is the regularizer that incorporates the prior information of the solution, and $\lambda$ is a trade-off parameter to balance these two terms.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (1)

2.2 Score-based Diffusion Model & SDEs

Score-based diffusion models are generative models in which samples from a probability density $p_{\text{data}}(\boldsymbol{x})$ can be produced via Langevin dynamics, a stochastic process used for sampling from probability distributions. In a score-based diffusion model, Langevin dynamics can be conceptualized as a methodical process for generating data points that are statistically consistent with the original dataset. It commences with an arbitrary data point and employs iterative updates to gradually align this point with the characteristics of the dataset. Central to this mechanism is the score function, $\nabla_{\boldsymbol{x}}\log p(\boldsymbol{x})$ [20], which represents the gradient of the logarithm of the probability density. This function provides directional guidance, indicating the path towards regions with higher data point probabilities. By adhering to this guidance, Langevin dynamics effectively moves the initial arbitrary point towards areas more densely populated by actual data points.

More specifically, given a fixed step size $\delta>0$ , and an initial value $\boldsymbol{x}_{0}$ (sampled from a prior distribution), the Langevin dynamics sampling process [20] can be modelled as:

\boldsymbol{x}_{t}=\boldsymbol{x}_{t-1}+\frac{\delta}{2}\nabla_{\boldsymbol{x}%}\log p(\boldsymbol{x}_{t-1})+\sqrt{\delta}N_{t},

(4)

where $t$ represent the sampling step, and $N_{t}\sim\mathcal{N}(0,\boldsymbol{I})$ denotes random disturbance, where $\boldsymbol{I}$ is the identity matrix. The distribution of $\boldsymbol{x}_{T}$ equals $p_{\text{data}}(\boldsymbol{x})$ when $\delta\rightarrow 0$ and the number of sampling steps $T\rightarrow\infty$ .From Eq.(4), to obtain samples from $p_{\text{data}}(\boldsymbol{x})$ , we need to train a score network $\boldsymbol{s}_{\boldsymbol{\theta}}(\boldsymbol{x})$ to estimate $\nabla_{\boldsymbol{x}}\log p_{\text{data}}(\boldsymbol{x})$ . This is the basic idea of score-based diffusion model.

In both vanilla[19] and score-based[20] diffusion models, the diffusion and sampling processes are manually divided into T fixed steps. However, in reality, the diffusion and sampling procedures should not be deliberately segmented into steps. Instead, each can be considered as a continuous transformation process over time, characterized by stochastic differential equations (SDEs)[28]. This provides a generalized representation by utilizing continuous processes to describe diffusion models. The diffusion process using SDEs can be expressed as[28]:

d\boldsymbol{x}=\boldsymbol{f}(\boldsymbol{x},t)\,dt+g(t)\,d\boldsymbol{w},

(5)

where $\boldsymbol{f}(\boldsymbol{x},t)$ is called drift coefficient and represents the deterministic process, $g(t)$ is the diffusion coefficient that introduces the stochastic process, and $d\boldsymbol{w}$ represents the derivative of Wiener Process and introduces perturbations at any point in time and induces uncertainty. $\boldsymbol{f}(\boldsymbol{x},t)$ is a function of the data samples $\boldsymbol{x}$ and time $t$ , and is deterministic. Both terms in Eq.(5) are essential to score-based diffusion models: one drives the system towards the high-density distribution of the data, and the other introduces randomness to ensure diversity among the generated samples. We note that $t$ in Eq.(5) represent a continuous time variable, whereas it is a discrete time variable in Eq.(4). We disambiguate this notation only when necessary.

2.3 Conditional predictor-corrector sampler

Unconditional generation often explores upper limits in terms of effects, while conditional generation mainly focuses on application-oriented tasks because it allows us to control output results according to specific requirements (i.e., generation of high-quality CT images using limited-angle projections). To address inverse problems in CT image reconstruction, conditional generation is indispensable. Generally speaking, there are two types of strategies for conditional control generation: classifier-guidance[21] and classifier-free[30]. For CT reconstruction, the classifier-guidance approach allows the model to leverage an angular classifier that provides probabilistic conditioning guidance based on the view angle of the limited projections. This enables tighter conditioning on the measurement data, and is less computationally expensive than training a diffusion model. Hence, it is more convenient for subsequent researchers to fine-tune a pretrained model based on rich classifier gradients, thereby obviating the need to train a new model.

For the case of limited-angle CT reconstruction, we wish to sample from a posterior distribution. This can be defined as the conditional distribution of $\boldsymbol{x}$ given the measurement $\boldsymbol{y}$ , i.e., $p(\boldsymbol{x}|\boldsymbol{y})$ . According to Bayes’ theorem, the score function of the posterior distribution can be computed as:

\nabla_{\boldsymbol{x}}\log p_{t}(\boldsymbol{x}|\boldsymbol{y})=\nabla_{%\boldsymbol{x}}\log p_{t}(\boldsymbol{x})+\nabla_{\boldsymbol{x}}\log p_{t}(%\boldsymbol{y}|\boldsymbol{x}).

(10)

Here, the likelihood enforces the data consistency, and thereby induces samples that satisfy $\boldsymbol{y}=\boldsymbol{A}\boldsymbol{x}$ . The calculation of Eq.(10) is decomposed into two stages: the initial stage consists of a denoising step, where the previously acquired data prior is applied to denoise the image, so we can simply use the pre-trained score function $\boldsymbol{s}_{\boldsymbol{\theta}}$ . The second stage is projection onto the measurement subspace.Within a discrete framework, this process is:

\boldsymbol{x}_{i-1}^{\prime}\leftarrow\text{solve}(\boldsymbol{x}_{i-1},%\boldsymbol{s}_{\boldsymbol{\theta}}),

(11)

\boldsymbol{x}_{i}\leftarrow P\{x|Ax=y\}(\boldsymbol{x}_{i-1}^{\prime}),

(12)

where ”solve” represents a numerical solver of Eq. 9, and $P$ denotes the projection operator by following the Projection Onto Convex Sets(POCS) principle to satisfy the data constraint. In our case, we use the PC sampler as a solver for Eq.(11) and the PDHG as a data-consistency module to solve Eq.(12). It should be noted that we directly applied our PDHG optimization steps to the noisy variables in Eq.12 by following [26, 24, 31]. This works when we use sufficiently small step sizes to enable correction along the way. The overall framework of our proposed method appears in Fig. 1. We use a trained network $\boldsymbol{s}_{\boldsymbol{\theta}}$ to denoise and use PDHG to enforce the data consistency.

The PC sampler, inspired by the predictor-corrector method in numerical solutions for differential equations, is utilized in the reverse process to generate new samples.Unlike traditional SDE samplers such as Euler-Maruyama (EM) and Runge-Kutta (RK) [32], this sampling strategy enhances the final solutions by incorporating additional information. The process involves predicting the next state in the diffusion process (the “predictor” step), and then correcting this prediction based on the actual observed data (the “corrector” step). In summary, the PC sampler alternates between two steps:

1.
Predictor step: Given the current state of the sample, it predicts the next state by taking a step in the direction of the gradient of the log probability of the data. This is equivalent to performing a step of gradient ascent on the log probability.
2.
Corrector step: It corrects the prediction by adding a small amount of noise. This noise is sampled from a Gaussian distribution whose standard deviation is determined by the diffusion process.

By alternating between the above two steps, the PC sampler gradually transforms a sample in the prior distribution into a sample in the data distribution, enabling the score-based diffusion model to generate new samples that closely resemble the training data.

3 Methodology

3.1 Motivation

While the utilization of a score-based diffusion model is beneficial, it is important to acknowledge its potential limitations. Specifically, there is a risk of generating false structures or artificial elements that do not correspond to the true underlying structures in the image. This is a consequence of employing a model that is entirely data-driven. Such a model relies heavily on the quality and representativeness of the input data.

To overcome this issue, we incorporate physical constraints. Physical constraints are derived from established theories or empirical observations that guide us on how the data should behave, providing an additional layer of control and validation on the output of the diffusion model. This can help to correct for aberrations such as false structures, aligning the output more closely to the known physical truths and thus ensuring a more accurate and robust image representation. The combination of data prior and model has proven an accurate and stable method in image reconstruction applications[33]. To combine model priors from the physics models and data priors from the diffusion model, the projection operator $P$ in Eq.(12) is replaced by the PDHG update method, and the Fourier fusion strategy is used between diffusion model denoised result and limited-angle CT image.

3.1.1 PDHG

The rationale for employing the PDHG methodology is well-known in medical image processing, and the minimization process in Eq.(3) can be viewed as an optimization task. Because the objective function is usually is non-differentiable, we are unable to utilize the gradient descent approach for optimization. An alternative strategy is to apply non-smooth convex optimization techniques. Proximal methods can work directly with non-smooth objective functions. In this case, a proximal step takes the place of the gradient step. In a proximal primal-dual scheme, an additional dual variable that is in the range of the operator is introduced, and the primal and dual variables are updated alternately. For the optimization problem below, three vector spaces are used: $I$ is the space of 2-dimensional (2D) discrete images, $D$ is the space of the CT projection data, and $V$ is the space of spatial-vector-valued image arrays ( $V=I^{2}$ ). We define the indicator function:

\delta_{\textrm{Box}(\alpha)}(x)\equiv\begin{cases}0&\|x\|_{\infty}\leq\alpha%\\\infty&\|x\|_{\infty}>\alpha\end{cases},

(13)

where $\|\cdot\|_{\infty}$ is an $\infty$ -norm used to select the maximum absolute value of the components of a vector. The set $\text{Box}(\alpha)$ contains vectors in which no component is greater than $\alpha$ . This means that all components of a vector in this set lie in the interval $[-\alpha,\alpha]$ . In 2D, $\text{Box}(\alpha)$ is a square centered at the origin with side length $2\alpha$ ; for any point $(x,y)$ inside this square, both $x$ and $y$ must lie within the interval $[-\alpha,\alpha]$ . Additionally, in the context of a given vector space $X$ , the notations $\mathbf{0}_{X}$ and $\mathbf{1}_{X}$ are employed to denote vectors in $X$ whose components are all set to 0 and 1, respectively. For instance, in a 2D space, $\mathbf{0}_{X}$ is $[0,0]$ and $\mathbf{1}_{X}$ is $[1,1]$ .

3.1.2 Fourier Fusion

Fourier fusion offers a promising solution to solving limited-angle artifacts by leveraging the complementary strengths of diffusion models and Fourier domain processing. Diffusion models are good at denoising and generating high-quality images from noisy or incomplete data. By integrating Fourier fusion into the framework, we aim to enhance spatial frequency components that are underrepresented or distorted due to the generative process from diffusion model, thus improving the overall resolution and detail in the reconstructed image. This approach utilizes a region combination strategy in the Fourier domain, where specific frequency regions from each source are selectively merged to optimize image quality.

3.2 Algorithmic Steps

A well-known primal-dual scheme is the PDHG algorithm, also referred to as the Chambolle-Pock algorithm[34]. It applies to a general form of the primal minimization and dual maximization:

\underset{x\in X}{\text{min}}\left(F(K(x))+G(x)\right),

(14)

\underset{y\in Y}{\text{max}}\left(-F^{*}(y)-G^{*}(-K^{T}y)\right).

(15)

In this context, $x$ and $y$ denote vectors of finite dimensionality within the spaces $X$ and $Y$ , respectively. The symbol $K$ is a linear operator mapping from $X$ to $Y$ . The functions $G$ and $F$ , which may not be smooth, represent convex mappings from the respective spaces $X$ and $Y$ to a set of non-negative real numbers. The superscript “ $\ast$ ” represents convex conjugation.

Particularly, in the context of Eq.(3), we apply an isotropic TV semi-norm as a regularization term denoted as $R(x)=\|(\nabla|x|)\|_{1}$ . This norm has proven useful for edge-preserving regularization in the PDHG optimization algorithm[35]. The spatial-vector depiction $\nabla x$ corresponds to a discrete approximation of the image gradient, residing in the vector space $V$ . As reported in [35], the objective function of PDHG to solve Eq.(3) can be derived as the following formulas:

		$\displaystyle F(x,z)=F_{1}(x)+F_{2}(z),$		(16)
		$\displaystyle F_{1}(x)=\frac{1}{2}\\|m-y\\|_{2}^{2},\quad F_{2}(z)=\lambda\\|(\|z\|%)\\|_{1},$
		$\displaystyle G(x)=0,$
		$\displaystyle m=Ax,z=\nabla x,$
		$\displaystyle K=\left(\begin{array}[]{l}A\\\nabla\end{array}\right),$

where $x\in I$ , $y\in D$ and $z\in V$ . To obtain the convex conjugate of $F$ , variables $p$ , $q$ and $r$ are introduced as auxiliary components to play a crucial role in the computation of the Lagrange multipliers. Then, we have the convex conjugate of $F$ and $G$ . Calculating the convex conjugate is essential for transitioning between the primal and dual spaces, thereby facilitating the primal-dual updates in the PDHG algorithm:

F^{*}(p,q)=\frac{1}{2}\|p\|_{2}^{2}+\langle p,y\rangle_{D}+\delta_{\mathrm{Box%}(\lambda)}(|q|),

(17)

G^{*}(r)=\delta_{\mathbf{0}_{I}}(r),

(18)

where $p\in D$ , $q\in V$ and $r\in I$ .

Finally, we obtain the proximal mapping of the PDHG algorithm for Eq.(3):

\operatorname{prox}_{\sigma}\left[F^{*}\right](y,z)=\left(\frac{m-\sigma y}{1+%\sigma},\frac{\lambda z}{\max\left(\lambda\mathbf{1}_{I},|z|\right)}\right),

(19)

where $\sigma$ is the step size to control the update of the variables in each iteration, and $\mathbf{1}_{I}$ is an image with all pixels set to 1. The proximal mapping function in this context serves as a mechanism to iteratively update the dual variables $y$ and $z$ , thereby influencing the optimization of the primal variable $x$ in the PDHG algorithm, ensuring convergence to an optimal solution that balances data fidelity and regularization.

Consequently, we can establish our PSDM algorithm as Algorithm 1. It should be noted that there are two iterative steps [see Eqs.(20) and (22)]: 1) The denoising step of SDE indexed by $i$ , and 2) PDHG iteration indexed by $n$ . In our case $n$ is set to 30. We apply the Fourier fusion and then in the second iteration step and initialization the $\boldsymbol{x}$ = $\boldsymbol{x}^{\prime}_{i}$ .

\boldsymbol{x}_{\text{DN}}\leftarrow\text{solve}(\boldsymbol{x}_{i},%\boldsymbol{s}_{\boldsymbol{\theta}}),

(20)

\boldsymbol{x}^{\prime}_{i}\leftarrow\mathcal{F}^{-1}\left\{M\left[\mathcal{F}%(\boldsymbol{x}_{\text{DN}})\right]+\mathcal{F}(\boldsymbol{X}_{\text{LACT}})%\right\},

(21)

\boldsymbol{x}_{i-1}\leftarrow\underset{\boldsymbol{x}}{\operatorname{argmin}}%\frac{1}{2}\left\|\boldsymbol{y}-\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}+%\|(|\nabla\boldsymbol{x}|)\|_{1}.

(22)

The purpose of initial iteration is to transform the dataset into the learned data distribution. The succeeding iteration can be interpreted as a data fidelity term to ensure the congruity between the generated data and the provided input. Specifically, Fourier fusion is strategically applied between steps 400 to 800 of the sampling process. This timing is chosen because the initial image quality at the very beginning is too poor for Fourier fusion to be effective. Additionally, introducing Fourier fusion too late in the process, particularly at the final stages, can potentially negatively impact the quality of the final reconstructed image.

1: $\nabla_{\boldsymbol{x}}\log p_{t}(\boldsymbol{x})$ , $N$ , $I$ , $\lambda$

2:Initialization $x_{N}\sim\mathcal{N}(0,\sigma_{T}^{2}I)$ , $L\leftarrow\|(\boldsymbol{A},\nabla)\|_{2}$ , $\tau\leftarrow\frac{1}{L}$ , $\sigma\leftarrow\frac{1}{L}$ , $\theta\leftarrow 1$ ;

3:for $i=I-1$ to $0$ do

4: $\boldsymbol{x}_{\text{DN}}\leftarrow\text{solve}(x_{i},\boldsymbol{s}_{%\boldsymbol{\theta}})$ $\triangleright$ SDE Denoising

5: $\boldsymbol{x}^{\prime}_{i}\leftarrow\mathcal{F}^{-1}\left\{M\left[\mathcal{F}%(\boldsymbol{x}_{\text{DN}})\right]+\mathcal{F}(\boldsymbol{X}_{\text{LACT}})\right\}$ $\triangleright$ FF

6:Initialization $u_{0},p_{0},q_{0},n\leftarrow 0,x_{n}\leftarrow 0$ , $\boldsymbol{x}^{\prime}_{n}\leftarrow\boldsymbol{x}^{\prime}_{i}$

7:repeat

8: $p_{n+1}\leftarrow(p_{n}+\sigma(\boldsymbol{A}\boldsymbol{x}^{\prime}_{n}-y))/{%(1+\sigma)}$

9: $q_{n+1}\leftarrow\lambda(q_{n}+\sigma\nabla\boldsymbol{x}^{\prime}_{n})/{\max(%\lambda\mathbf{1}_{I},|q_{n}+\sigma\nabla\boldsymbol{x}^{\prime}_{n}|)}$

10: $\boldsymbol{x}_{n+1}\leftarrow\boldsymbol{x}_{n}-\tau\boldsymbol{A}^{T}p_{n+1}%+\tau divq_{n+1}$

11: $\boldsymbol{x}^{\prime}_{n+1}\leftarrow\boldsymbol{x}_{n+1}+\theta(\boldsymbol%{x}_{n+1}-\boldsymbol{x}_{n})$

12: $n\leftarrow n+1$

13:until $n\geq N$

14: $x_{i}\leftarrow x^{\prime}_{n+1}$ $\triangleright$ After $N$ PDHG Iterations

15:return $x_{0}$

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (2)

4 Experiments and results

In this section, we first provide an overview of our cardiac CT simulation dataset and outline the essential steps for the training and evaluation of the proposed method. We then compare our results with those obtained from Diffusion-MBIR, another state-of-the-art diffusion model used for CT reconstruction. We also conduct a comparison against several fully-supervised methods. Specifically, we use TomoGAN (a GAN-based generative model) and FBP-ConvNet as our baselines. Finally, we include PDHG-TV, which uses a regularization function, in our study.

4.1 Numerical simulations

4.1.1 Dataset preparation and network training

We use the 4D Extended Cardiac-Torso (XCAT) phantom version 2 to generate cardiac CT phantom data. XCAT is a multimodal phantom developed at the Duke University [36, 37]. XCAT can produce a 4D attenuation coefficient phantom to simulate a patient with a beating heart to generate a dynamic cardiac CT scan. We empirically calibrate the tube current-time product (mAs) to ensure that the noise level of the simulated CT images matches that of the real cardiac CT images. In our study, we set mAs per projection to 0.25. And we conceptualize a single cardiac cycle as a duration of one second. Within this framework, we divide each cardiac cycle into 200 discrete phases and sample at equal time intervals to ensure precise temporal resolution. Utilizing these parameters, we generate simulated images for 10 individuals and reconstruct images at phases ranging from 20% to 80% with an interval of 12%. We employ an equiangular cone beam projection geometry, which is based on a model of a GE CT scanner. The detector array is comprised of 835 units, while the complete scan encompasses 984 views. The field-of-view (FOV) diameter covers an expansive 50 × 50 $\mathrm{cm^{2}}$ . The distance between the X-ray source and origin is set to 53.852 $\mathrm{cm}$ . In this study, a cross-validation approach is employed to train our model based on the noise conditional score network plus plus(NCSNPP) [28]. The database comprises 10 patient datasets, each containing 2D slices of 256 × 256 pixels, representing a total of 3,714 slices. In each fold of the cross-validation, nine patient datasets (3,362 slices) are used to train the model, while the remaining patient dataset (352 slices) is used for testing. This process is repeated 10 times, with each patient dataset being used exactly once as the testing set. Specifically, in the simulation results section, phase100 in patient 153 is labeled as Case 1, phase112 is labeled as Case 2.

For inference, we adopt the PC sampling strategy with $I$ =1,000. In every iteration, data consistency is maintained through the application of the PDHG algorithm, utilizing a sequence of 30 iterations. The experiments are conducted on a high-performance workstation equipped with an Intel i9-9920X CPU operating at 3.50 GHz and an NVIDIA GeForce 2080TI graphics processing unit (GPU). The proposed method is implemented in the PyTorch framework and optimized using the Adam optimizer. The batch size is set as 1, and the network is trained for 200 epochs. It is noteworthy that the Operator Discretization Library (ODL)[38] is utilized to implement the PDHG algorithm and the Astra toolbox is adopted to facilitate fast forward and backward projections for image reconstruction in X-ray tomography.

4.1.2 Simulation Results

Fig. 2 shows the representative reconstruction results for patient 153 across phases 100 and 112, utilizing different methods for 120-degree limited-angle reconstruction. The first and third rows display the reconstruction results, while the second and fourth rows provide the corresponding SSIM map for comparative analysis. For the 120-degree condition, most of the methods achieve satisfactory results, except for FBP and PDHG-TV, which are affected by strong limited-angle artifacts.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (3)

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (4)

FBPConvNet and TomoGAN show compromised edge quality along oblique directions, while the Diffusion-MBIR and PSDM demonstrate superior edge preservation. However, the Diffusion-MBIR images lack the finer details that are perceivable in PSDM images.

For further comparison, two regions of interest (ROIs) are extracted from Cases 1 and Case 2, and subsequently magnified as in Fig. 3. Upon close examination of the structures, specifically those indicated by the red arrows, it is evident that the images are subject to blur and distortion when they are processed by PDHG-TV, FBPConvNet, and TomoGAN. Conversely, the Diffusion-MBIR and PSDM yield superior image quality. However, it should be noted that Diffusion-MBIR images are not entirely devoid of imperfections, as they continue to retain some noise and exhibit minor distortions at the locations indicated by the red arrows. Furthermore, when looking closely at the area indicated by the red circles, PSDM provides clearer reconstruction of the LAD branches compared to Diffusion-MBIR.

	120-view				90-view
Method	PSNR $\uparrow$	SSIM $\uparrow$	HC $\uparrow$	LBP-TS $\downarrow$	PSNR $\uparrow$	SSIM $\uparrow$	HC $\uparrow$	LBP-TS $\downarrow$
PDHG-TV[35]	27.856	0.860	0.911	0.191	24.067	0.765	0.498	0.201
FBPConvNet[39]	30.552	0.912	0.957	0.023	27.82	0.870	0.821	0.008
TomoGAN[40]	24.698	0.854	0.781	0.009	22.861	0.818	0.813	0.007
Diffusion-MBIR[26]	32.991	0.902	0.797	0.011	27.01	0.827	0.743	0.01
PSDM (ours)	35.56	0.938	0.942	0.0008	30.60	0.901	0.912	0.0015

We conduct further tests using a 90-degree limited-angle dataset. The results, along with the magnified ROI images, are presented in Fig. 4. We observe that the PDHG-TV method exhibits noticeable limited-angle artifacts. Both FBPConvNet and TomoGAN demonstrate varying degrees of deformation at the diagonal edges. The diffusion-based methods appear to be more stable, without significant evident structural deformation. Similar to the 120 degree limited-angle reconstruction results, our PSDM method outperforms the Diffusion-MBIR in reconstructing finer structural details in LAD area, as highlighted by the red circles.

4.1.3 Quantitative Evaluation

A quantitative assessment is performed for different competing reconstruction techniques. In addition to the commonly used SSIM and PSNR, we also employ histogram correlation (HC) and local-binary-pattern-based texture similarity (LBP-TS) as evaluation metrics. HC measures the similarity between the histograms of reconstruction results and labels. A high correlation indicates that the images have similar intensity distributions, which is crucial for tasks like CT reconstruction where maintaining intensity distribution is desirable. LBP-TS evaluates the texture similarity between two images by extracting local binary patterns from them and comparing the patterns[41]; smaller values indicate higher textural image similarity.

The images that are reconstructed in two different scenarios (120°and 90°) are analyzed, and the quantitative metrics are presented in Table 1. Consistent with visual impressions, PSDM yields more favorable quantitative outcomes compared to the PDHG-TV, FBPConvNet, TomoGAN, and Diffusion-MBIR methods. In particular, the PSDM obtains the highest value of PSNR, SSIM, and HC in all cases. In terms of LBP-TS, the PSDM consistently achieves the smallest (most desirable) values. The empirical findings from this quantitative analysis provide a substantive validation of the merits of the PSDM method, underline its superior performance for limited-angle image reconstruction under the current test conditions.

4.2 Experiments on clinical data

For cardiac CT imaging, it is crucial to freeze the beating heart to avoid motion artifacts. However, the conventional ECG-gating does not work well for patients with tachycardia and arrhythmias, and radiation exposure is relatively high due to the overlapped continuous scanning needed for retrospective gating. By using limited-angle reconstruction techniques, temporal resolution can be significantly improved, allowing for clearer visualization of fast-moving structures such as heart valves and myocardial walls.

4.2.1 Stanford AIMI COCA results

To advance our clinical dataset evaluation, the Stanford AIMI COCA dataset[42] is employed. The COCA dataset is a compilation of 789 gated CT scans, with each scan corresponding to an individual patient. On average, every patient contributes roughly 50 image slices, and each image slice possesses a resolution of 512 x 512 pixels. The NCSNPP model is trained with data from 60 patients, totaling approximately 3,400 image slices. Apart from the image size, all other parameters remain consistent with the simulation training. The reconstruction results from 120 views appear in Fig. 5. Significantly, our proposed method produces images of enhanced quality characterized by prominent features and sharp image boundaries. This is especially discernible in the areas and structures underscored by red circles and arrows. In contrast, other algorithms tend to obscure certain details and introduce blurring.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (5)

4.2.2 Clinical cardiac results

Generalization is a major issue for DLR-based methods, especially for models requiring supervised learning. If a model is trained strictly on data with a specific angle range (e.g., 0–120°), it will not generalize well to other angular ranges. However, different clinical situations might require different angular ranges. For instance, certain anatomical regions or pathologies are better visualized with a specific angle range. To further demonstrate the advantages of PSDM, the algorithm is applied to a real clinical cardiac dataset. With the approval of the Institutional Review Board of the University of Massachusetts, Lowell, a deidentified clinical cardiac CT dataset is obtained from a GE HD 750 scanner. The patient was scanned using axial mode. 1,520 projections are acquired over an angular range of 556°( $\approx$ 1.54 rotations). Each projection row has 835 elements at 1.095 mm pitch. The source to the rotation center distance is 538.5 mm, and the source to the detector distance is 946.7 mm. Reconstruction is performed by FBP using equi-angular geometry, where the image size is set to $512\times 512$ to accelerate the sampling process. Since the supervised learning-trained models (FBPConvNet and TomoGAN) usually can only process specific angle range, we compare only the two unsupervised learning methods: Diffusion-MBIR and PSDM. During the sampling stage, these two methods directly utilize the checkpoints trained on the aforementioned Stanford AIMI COCA datasets. Fig. 6 shows the 120-view limited-angle reconstruction images from different angular ranges. We find that both Diffusion-MBIR and PSDM produce consistent and apparently acceptable results. However, compared with the Diffusion-MBIR, the results produced by PSDM exhibit fewer limited-angle artifacts and appear to provide a more realistic representation. These differences can be seen in the areas and structures underscored by red circles and arrows. Specifically, in the atrial region denoted by the red arrow, PSDM produces more stable results, while Diffusion-MBIR still exhibits limited-angle artifacts in the atria.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (6)

4.3 Ablation study

In the ablation study section, we explore the impact of various components and parameters in our proposed method, mainly focuses on three aspects: the number of reverse diffusion steps, the effect of the TV term, and how Fourier fusion module affect the denoising process. This examination is crucial to understand the contribution of each element to the overall performance of the system.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (7)

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (8)

Figure. 7 presents a comparative analysis of the reconstruction results with and without the TV term. These quantitative metrics (SSIM and the PSNR) indicate that the TV term can improve the perceptual quality and the fidelity of the reconstructed images. Visually, results with the TV term exhibit less noise and more consistent structural integrity compared to those without the TV prior. Figure. 8 illustrates the performance comparison between PSDM and Diffusion-MBIR with respect to numbers of reverse diffusion steps. It is observed that PSDM demonstrates faster convergence relative to Diffusion-MBIR. Specifically, in the iterations ranging from 600 to 800, Diffusion-MBIR exhibits higher noise levels compared to PSDM. Figure 9 provides a quantitative comparison of the performance with and without the Fourier fusion module. It is observed that between 400-800 reverse steps, the PSDM (presumably a metric being compared) with the Fourier fusion module outperforms the one without it. Ultimately, both the PSNR and SSIM metrics show a slight improvement when the Fourier fusion module is utilized, highlighting its benefits.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (9)

5 Discussion & Conclusion

In the realm of language models, we have witnessed the rise of models such as ChatGPT, which have achieved impressive generalizability. Similarly, the next leap in imaging might be a generalized image model, versatile across tasks and less reliant on specific datasets. While current computational imaging methods merge traditional and modern data-driven techniques, they often falter in unfamiliar conditions due to dataset dependencies. This highlights the need for diverse datasets that are hard to procure in areas such as cardiac CT. The score-based diffusion model offers a solution. It can generate new samples from a learned data distribution and without requiring a network architecture specific to the problem at hand. Usually a standard architecture such as U-Net can be employed as denoiser, and adversarial learning is not required. Thus, diffusion models fill an essential gap in the landscape of generative models, offering an efficient and uncomplicated approach to model complex data distributions. Such streamlined design, combined with a straightforward training process, offers a powerful alternative to models that require either intricate design, or pose complex adversarial training challenges. As an unsupervised deep learning model, it is adaptable and less dataset-dependent. By incorporating physics-based methods, it emphasizes understanding physical processes rather than data reliance, broadening its conditions range and enhancing generalizability.

The combination of PDHG and SDE for CT reconstruction in this area suggests that PDHG could potentially be replaced with another model-based reconstruction algorithm. However, PDHG known for its effectiveness in solving non-smooth optimization problems, particularly in image processing, might have a specific synergy with the SDE-based approach. For instance, the primal-dual nature of PDHG could be especiallyadept at handling the noise model or the data fidelity term as formulated in the SDE approach. Secondly, PDHG is known for its strong convergence properties, especially in the context of convex optimization problems with non-smooth terms. If SDE-based denoising leads to an optimization landscape that benefits from these convergence properties, this would be a significant justification for using PDHG over other algorithms.

To prove the advantage of PDHG, a set of experiments are conducted to evaluate the importance of the PDHG in PSDM. The major difference between PSDM and Diffusion-MBIR is the data consistency component. For the PSDM, we employ the PDHG algorithm, which has demonstrated superior effectiveness and robustness compared to the ADMM algorithm used in Diffusion-MBIR. It is well known that the number of iterations is an important parameter in optimization algorithms. Typically, increasing the number of iterations improves the final results, but this comes at the cost of a significant rise in computational time. Fig. 10 presents a comparison between PSDM and Diffusion-MBIR as a function of the number of iterations, sampling one image. It is evident that as the iteration number increases, the computational cost of Diffusion-MBIR increases dramatically. In contrast, the sampling time for the proposed PSDM remains relatively stable and acceptable. Furthermore, insights from Fig. 11 reveal that, initially, the Diffusion-MBIR achieves superior PSNR and SSIM values compared to the PSDM. However, as the iteration number grows, PSDM witnesses a significant improvement in both PSNR and SSIM. On the other hand, the Diffusion-MBIR appears to plateau, suggesting a performance ceiling.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (10)

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (11)

However, while PSDM exhibits a significantly reduced sampling time compared to the Diffusion-MBIR method, it falls short in applications requiring real-time deartifacting. This is a crucial aspect that needs to be addressed, because accelerating the sampling procedure could greatly enhance the model’s applicability in time-sensitive tasks, such as in a clinical setting where quick decision-making is essential. While our proposed approach is empirically driven, we continue to investigate the supporting theory. Although PSDM shows good performance in handling a variety of datasets, its adaptability has limits and there is room to further improve image quality. This indicates a need for further refinement in its ability to generalize across more diverse data types. Theoretically speaking, it is not ideal to apply projection steps to the noisy variable in both Diffusion-MBIR and PSDM because the noisy variable does not belong to the subspace of the system matrix A due to the added noise. In the future, we plan to adopt a more accurate probabilistic formula such as DPS [43]. DPS estimates mean of $x_{0}$ from $x_{t}$ and apply the data fidelity to the estimated $x_{0}$ . However, it introduces a Jacobian term, and a hard consistency method is introduced in some papers to avoid this[44]. We will consider incorporating this approach in the future.

In conclusion, while data-driven methods have already revolutionized the field of imaging and computer vision, the limitations of these methods in terms of generalizability and data dependency cannot be overlooked. The physics-informed score-based diffusion model, by synergizing data- and model-based methods, offers a promising direction, emphasizing the importance of understanding the underlying physics. This approach not only mitigates the challenges posed by data dependency but also broadens the horizons for tackling diverse conditions in cardiac CT reconstruction and without a significant increase in computational cost.

References

[1]K.Korsholm, S.Berti, X.Iriart, J.Saw, D.D. Wang, H.Cochet, D.Chow,A.Clemente, O.DeBacker, J.MøllerJensen etal., “Expertrecommendations on cardiac computed tomography for planning transcatheterleft atrial appendage occlusion,” Cardiovascular Interventions,vol.13, no.3, pp. 277–292, 2020.
[2]D.Bharkhada, H.Yu, and G.Wang, “Knowledge-based dynamic volumetric cardiaccomputed tomography with saddle curve trajectory,” Journal of computerassisted tomography, vol.32, no.6, p. 942, 2008.
[3]G.Wang, S.Zhao, and D.Heuscher, “A knowledge-based cone-beam x-ray CTalgorithm for dynamic volumetric cardiac imaging,” Medical physics,vol.29, no.8, pp. 1807–1822, 2002.
[4]G.A. Roth, G.A. Mensah, C.O. Johnson, G.Addolorato, E.Ammirati, L.M.Baddour, N.C. Barengo, A.Z. Beaton, E.J. Benjamin, C.P. Benzigeretal., “Global burden of cardiovascular diseases and risk factors,1990–2019: update from the gbd 2019 study,” Journal of the AmericanCollege of Cardiology, vol.76, no.25, pp. 2982–3021, 2020.
[5]D.J. Brenner and E.J. Hall, “Computed tomography–an increasing source ofradiation exposure,” New England journal of medicine, vol. 357,no.22, pp. 2277–2284, 2007.
[6]L.Li, K.Kang, Z.Chen, L.Zhang, Y.Xing, H.Yu, and G.Wang, “Analternative derivation and description of smith’s data sufficiency conditionfor exact cone-beam reconstruction,” Journal of X-Ray Science andTechnology, vol.16, no.1, pp. 43–49, 2008.
[7]E.Y. Sidky and X.Pan, “Image reconstruction in circular cone-beam computedtomography by constrained, total-variation minimization,” Physics inMedicine & Biology, vol.53, no.17, p. 4777, 2008.
[8]S.Niu, Y.Gao, Z.Bian, J.Huang, W.Chen, G.Yu, Z.Liang, and J.Ma,“Sparse-view x-ray CT reconstruction via total generalized variationregularization,” Physics in Medicine & Biology, vol.59, no.12, p.2997, 2014.
[9]Z.Zhang, B.Chen, D.Xia, E.Y. Sidky, and X.Pan, “Directional-TValgorithm for image reconstruction from limited-angular-range data,”Medical Image Analysis, vol.70, p. 102030, 2021.
[10]G.Wang, J.C. Ye, and B.DeMan, “Deep learning for tomographic imagereconstruction,” Nature Machine Intelligence, vol.2, no.12, pp.737–748, 2020.
[11]B.Morovati, R.Lashgari, M.Hajihasani, and H.Shabani, “Reduced DeepConvolutional Activation Features (R-DeCAF) in HistopathologyImages to Improve the Classification Performance for BreastCancer Diagnosis,” Journal of Digital Imaging, Aug. 2023.
[12]S.Han, Y.Zhao, F.Li, D.Ji, Y.Li, M.Zheng, W.Lv, X.Xin, X.Zhao, B.Qietal., “Dual-path deep learning reconstruction framework forpropagation-based x-ray phase–contrast computed tomography with sparse-viewprojections,” Optics Letters, vol.46, no.15, pp. 3552–3555, 2021.
[13]F.Li, Y.Zhao, S.Han, D.Ji, Y.Li, M.Zheng, W.Lv, J.Jian, X.Zhao, andC.Hu, “Physics-informed deep neural network reconstruction framework forpropagation-based x ray phase-contrast computed tomography with sparse-viewprojections,” Optics Letters, vol.47, no.16, pp. 4259–4262, 2022.
[14]D.P. Kingma, M.Welling etal., “An introduction to variationalautoencoders,” Foundations and Trends® in MachineLearning, vol.12, no.4, pp. 307–392, 2019.
[15]I.Goodfellow, J.Pouget-Abadie, M.Mirza, B.Xu, D.Warde-Farley, S.Ozair,A.Courville, and Y.Bengio, “Generative adversarial nets,” Advancesin neural information processing systems, vol.27, 2014.
[16]M.Arjovsky and L.Bottou, “Towards principled methods for training generativeadversarial networks,” in International Conference on LearningRepresentations, 2016.
[17]Q.Wu, R.Gao, and H.Zha, “Bridging explicit and implicit deep generativemodels via neural stein estimators,” in Advances in Neural InformationProcessing Systems, 2021.
[18]J.Sohl-Dickstein, E.Weiss, N.Maheswaranathan, and S.Ganguli, “Deepunsupervised learning using nonequilibrium thermodynamics,” inInternational conference on machine learning.PMLR, 2015, pp. 2256–2265.
[19]J.Ho, A.Jain, and P.Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol.33, pp.6840–6851, 2020.
[20]Y.Song and S.Ermon, “Generative modeling by estimating gradients of the datadistribution,” Advances in neural information processing systems,vol.32, 2019.
[21]P.Dhariwal and A.Nichol, “Diffusion models beat GANs on image synthesis,”Advances in neural information processing systems, vol.34, pp.8780–8794, 2021.
[22]C.Saharia, J.Ho, W.Chan, T.Salimans, D.J. Fleet, and M.Norouzi, “Imagesuper-resolution via iterative refinement,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol.45, no.4, pp. 4713–4726,2022.
[23]B.Kawar, M.Elad, S.Ermon, and J.Song, “Denoising diffusion restorationmodels,” Advances in Neural Information Processing Systems, vol.35,pp. 23 593–23 606, 2022.
[24]H.Chung, B.Sim, and J.C. Ye, “Come-closer-diffuse-faster: Acceleratingconditional diffusion models for inverse problems through stochasticcontraction,” in Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 2022, pp. 12 413–12 422.
[25]Y.Song, L.Shen, L.Xing, and S.Ermon, “Solving inverse problems in medicalimaging with score-based generative models,” in InternationalConference on Learning Representations, 2021.
[26]H.Chung, D.Ryu, M.T. McCann, M.L. Klasky, and J.C. Ye, “Solving 3dinverse problems using pre-trained 2d diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition, 2023, pp. 22 542–22 551.
[27]N.Komodakis and J.-C. Pesquet, “Playing with duality: An overview of recentprimal? dual approaches for solving large-scale optimization problems,”IEEE Signal Processing Magazine, vol.32, no.6, pp. 31–54, 2015.
[28]Y.Song, J.Sohl-Dickstein, D.P. Kingma, A.Kumar, S.Ermon, and B.Poole,“Score-based generative modeling through stochastic differentialequations,” in International Conference on Learning Representations,2021.
[29]B.D. Anderson, “Reverse-time diffusion equation models,” StochasticProcesses and their Applications, vol.12, no.3, pp. 313–326, 1982.
[30]J.Ho and T.Salimans, “Classifier-free diffusion guidance,” in NeurIPS2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
[31]H.Chung and J.C. Ye, “Score-based diffusion models for accelerated mri,”Medical image analysis, vol.80, p. 102479, 2022.
[32]P.E. Kloeden and E.Platen, Numerical Solution of StochasticDifferential Equations.SpringerScience & Business Media, 2013, vol.23.
[33]W.Wu, D.Hu, W.Cong, H.Shan, S.Wang, C.Niu, P.Yan, H.Yu,V.Vardhanabhuti, and G.Wang, “Stabilizing deep tomographic reconstruction:Part A. Hybrid framework and experimental results,” Patterns,vol.3, no.5, 2022.
[34]A.Chambolle and T.Pock, “A first-order primal-dual algorithm for convexproblems with applications to imaging,” Journal of mathematicalimaging and vision, vol.40, pp. 120–145, 2011.
[35]E.Y. Sidky, J.H. Jørgensen, and X.Pan, “Convex optimization problemprototyping for image reconstruction in computed tomography with thechambolle–pock algorithm,” Physics in Medicine & Biology, vol.57,no.10, p. 3065, 2012.
[36]W.P. Segars, G.Sturgeon, S.Mendonca, J.Grimes, and B.M. Tsui, “4DXCAT phantom for multimodality imaging research,” Medical physics,vol.37, no.9, pp. 4902–4915, 2010.
[37]Y.Xu, A.Sushmit, Q.Lyu, Y.Li, X.Cao, J.S. Maltz, G.Wang, and H.Yu,“Cardiac CT motion artifact grading via semi-automatic labeling and vesseltracking using synthetic image-augmented training data,” Journal ofX-Ray Science and Technology, vol.30, no.3, pp. 433–445, 2022.
[38]J.Adler, H.Kohr, and O.Öktem, “Odl-a python framework for rapidprototyping in inverse problems,” Royal Institute of Technology,2017.
[39]K.H. Jin, M.T. McCann, E.Froustey, and M.Unser, “Deep convolutional neuralnetwork for inverse problems in imaging,” IEEE transactions on imageprocessing, vol.26, no.9, pp. 4509–4522, 2017.
[40]Z.Liu, T.Bicer, R.Kettimuthu, D.Gursoy, F.DeCarlo, and I.Foster,“TomoGAN: low-dose synchrotron x-ray tomography with generativeadversarial networks: discussion,” JOSA A, vol.37, no.3, pp.422–434, 2020.
[41]T.Ojala, M.Pietikainen, and T.Maenpaa, “Multiresolution gray-scale androtation invariant texture classification with local binary patterns,”IEEE Transactions on pattern analysis and machine intelligence,vol.24, no.7, pp. 971–987, 2002.
[42]S.U.S. AIMI, “Coca - coronary calcium and chest cts,”https://stanfordaimi.azurewebsites.net/datasets/e8ca74dc-8dd4-4340-815a-60b41f6cb2aa,2021.
[43]H.Chung, J.Kim, M.T. Mccann, M.L. Klasky, and J.C. Ye, “Diffusionposterior sampling for general noisy inverse problems,” in TheEleventh International Conference on Learning Representations, 2022.
[44]B.Song, S.M. Kwon, Z.Zhang, X.Hu, Q.Qu, and L.Shen, “Solving inverseproblems with latent diffusion models via hard data consistency,”arXiv preprint arXiv:2307.08123, 2023.