Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (2024)

\FAILED\FAILED

Shuo Han   Yongshun Xu   Dayang Wang   Bahareh Morovati   Li Zhou   Jonathan S. Maltz   \IEEEmembershipSenior Member, IEEE   Ge Wang \IEEEmembershipFellow, IEEE   and Hengyong Yu   \IEEEmembershipFellow, IEEEThe paper was submitted on XXXXX. This work was supported in part by NIH/NIBIB under grants R01EB032807 and R01EB034737, and NIH/NCI under grant R21CA264772.S. Han, Y.S. Xu, D.Y. Wang, B. Morovati, L. Zhou, and H.Y. Yu are with the Department of Electrical and Computer Engineering, University of Massachusetts Lowell, Lowell, MA, 01854, USA.J.S. Maltz is with the Molecular Imaging and Computed Tomography, GE Healthcare, Waukesha, WI, 53188, USA.G. Wang is with the Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, 21180, USA.H.Y. Yu serves as the corresponding author (email: hengyong-yu@ieee.org).

Abstract

Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this paper, we propose a novel physics-informed score-based diffusion model (PSDM) for limited-angle reconstruction of cardiac CT. At the sampling time, we combine a data prior from a diffusion model and a model prior obtained via an iterative algorithm and Fourier fusion to further enhance the image quality. Specifically, our approach integrates the primal-dual hybrid gradient (PDHG) algorithm with score-based diffusion models, thereby enabling us to reconstruct high-quality cardiac CT images from limited-angle data. The numerical simulations and real data experiments confirm the effectiveness of our proposed approach.

{IEEEkeywords}

Computed tomography (CT), limited-angle CT reconstruction, diffusion model, iterative reconstruction, deep learning

1 Introduction

\IEEEPARstart

Cardiac Computed tomography (CT) is a powerful and widely deployed tool for non-invasive imaging of the beating heart and surrounding structures[1]. Specifically, electrocardiogram (ECG)-gated cardiac CT provides three-dimensional (3D) high-resolution images, enabling detailed examination of cardiac anatomy and function[2, 3].The evolution of CT scanners toward faster rotational times and higher tube powers is largely driven by the need to freeze the motion of the beating heart. The evolution of CT scanners toward faster rotational times and higher tube powers is largely driven by the need to freeze the motion of the beating heart. Motion artifact may affect the diagnosis of cardiovascular disease (CVD), which is the leading cause of death globally [4]. The main cardiac vessels are the right coronary artery (RCA), supplying the right atrium and ventricle, the left anterior descending artery (LAD), supplying the anterior and apical left ventricle, and the left circumflex artery (LCX), supplying the lateral and posterior walls of the left ventricle. In addition, there are risks associated with the radiation dose applied during conventional CT imaging techniques, prompting the exploration of alternative approaches to reduce radiation exposure [3]–[5].

Limited-angle CT (LACT) imaging can potentially address these two issues by limiting the scan time, and hence the motion-sensitive time window, and patient radiation exposure. Dose reduction benefits vulnerable patient groups and patients who undergo multiple scans[5].

However, the limited-angle acquisitions provide incomplete spatial frequency coverage, making it difficult to accurately reconstruct the object[6]. The reconstructed images often contain streaking artifacts and distortions, particularly along directions for which data are missing. Existing strategies to enhance limited-angle CT image reconstruction quality typically utilize iterative reconstruction (IR) algorithms. These techniques optimize an objective function, which usually integrates a precise system model and an image-domain-based prior. One popularly used image prior is total variation (TV) minimization [7] and its variants [8, 9]. Although these IR algorithms produce images with substantially improved image quality, IR may result in the loss of fine details, and residual limited-angle-associated artifacts. In the past few years, deep learning reconstruction (DLR) has demonstrated potential to replace and supplement IR-based methods [10, 11, 12, 13]. DLR methodologies are widely applied in the image domain, the projection domain, and occasionally, in both. However, these methods usually require paired projection or image data which is challenging to acquire to enable supervised learning. This constraint may hinder the practicality and generalizability of DLR in certain applications.

Recently, generative models have been at the forefront of advances in computer vision. The landscape of generative models can be generally divided into two categories: likelihood-based models and implicit generative models. The former category directly approximates the data distribution. A representative approach is the use of variational auto-encoders (VAEs) [14]. The aim is to produce an output virtually identical to the input for a given input (e.g., an image). VAEs can generate highly diversity samples and easy to train. However, the fidelity of the generated samples typically remains suboptimal because the encoder predicts the distribution of the latent code and it has a pixel-based loss. In contrast, implicit generative models (e.g., generative adversarial networks (GANs) [15]) adopt an indirect approach to fit the data distribution. These models are trained to output instances that, upon discrimination, fall within the target distribution. This strategy can generate high-fidelity samples, but the diversity of samples is poor because adversarial loss does not incentivize covering the entire data distribution. Notwithstanding their profound contributions to the field, implicit generative models are often criticized for their complicated training requirements, owing primarily to their dependence on adversarial learning[16, 17]. This complexity often leads to instability and potential model collapse, posing challenges to their broad utility in real-world applications. Despite these shortcomings, both likelihood-based and implicit generative models have demonstrated great utility for many tasks in imaging.

The central problem in machine learning involves modeling complicated datasets using highly flexible probabilistic distribution families. It is challenging to ensure that learning, sampling, inference, and evaluation within these distribution families can be handled analytically or computationally. Inspired by the principles of non-equilibrium thermodynamics[18], diffusion models (DMs), as an alternative to the existing generative models, emerged as a potential solution for modeling the data. They demonstrate a unique and appealing approach to generative modeling[19, 20, 21]. The basic idea of diffusion models is to systematically and gradually degrade the structures in the data distribution through an iterative forward diffusion process[19]. This degradation is then counteracted by a reverse diffusion process that restores the structure within the data, yielding a highly flexible and easily manageable data generative model. Diffusion models have demonstrated significant advancements in imaging tasks, for example: super-resolution[22], image in-painting[23], magnetic resonance image reconstruction[24], and CT reconstruction[25][26]. Notably, the conventional score-based diffusion models operate within the realm of unsupervised learning, offering potential advantages such as the reduction of data requirements and enhanced flexibility.

Nevertheless, in the special field of tomographic image reconstruction, there is an underlying desire for the ability to manipulate the final outcome to not only include image denoising and artifact removal but also to ensure the maintenance of image quality, while enhancing imaging metrics such as contrast resolution. Our results suggest that the proposed method can achieve high-quality reconstructions from limited-angle data. It appears to address the issue of false structure generation, which could enhance the reliability and clinical applicability of the reconstructed images. Primal-dual hybrid gradient (PDHG) is often more robust to noise and other perturbations than ADMM[27]. This can be beneficial in limited-angle CT reconstruction, where the data is often noisy and under-sampled.

In the pursuit of high-quality image reconstruction for limited-angle CT imaging, we introduce a novel approach we term a physics-informed score-based diffusion model (PSDM). This approach combines a model-based optimization strategy with the sampling steps of the score-based model, allowing simultaneous utilization of both data-based (as in DLR) and model-based (as in IR) priors. First, the score-based models are employed to generate an initial image, leveraging a score. Subsequently, we combine the initial image and the limited-angle CT image in the Fourier domain to leverage the complementary frequency domain characteristics. Finally, another data consistency step is implemented via the PDHG optimization. This step is instrumental in eliminating the discrepancy between the initial and target images. Furthermore, isotropic TV regularization is utilized to enhance the refinement of the final image.

2 Background

2.1 CT imaging model

CT acquisition is primarily affected by two kinds of noise. The first type is electronic noise, which originates from the detection system and adheres to a Gaussian distribution, being independent of object structure. The second type is photon statistical noise from inherent quantum fluctuations during X-ray emission, and it usually conforms to a Poisson distribution, the mean of which, for a particular ray, is dependent on the x-ray source beam characteristics and the structure of the imaged object. In the projection domain, the noise exhibits a complicated statistical distribution that blends both Gaussian and Poisson noise:

𝒏𝒩(0,σ2)+𝒫(ξ(𝒙)),𝒏𝒩0superscript𝜎2𝒫𝜉𝒙\boldsymbol{n}\thicksim\mathcal{N}(0,\sigma^{2})+\mathcal{P}(\xi(\boldsymbol{x%})),bold_italic_n ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + caligraphic_P ( italic_ξ ( bold_italic_x ) ) ,(1)

where 𝒏𝒏\boldsymbol{n}bold_italic_n is the pre-log noise variable, 𝒩(0,σ2)𝒩0superscript𝜎2\mathcal{N}(0,\sigma^{2})caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) denotes the Gaussian distribution with variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and 𝒫(ξ(𝒙))𝒫𝜉𝒙\mathcal{P}(\xi(\boldsymbol{x}))caligraphic_P ( italic_ξ ( bold_italic_x ) ) is Poisson distribution in which ξ(𝒙)𝜉𝒙\xi(\boldsymbol{x})italic_ξ ( bold_italic_x ) denotes the latent mapping from image structure 𝒙𝒙\boldsymbol{x}bold_italic_x to the distribution expectation.In a general model of CT image reconstruction 𝒙𝒙\boldsymbol{x}bold_italic_x must fulfill:

𝒚=𝑨𝒙+𝒏~,𝒚𝑨𝒙bold-~𝒏\boldsymbol{y}=\boldsymbol{A}\boldsymbol{x}+\boldsymbol{\tilde{n}},bold_italic_y = bold_italic_A bold_italic_x + overbold_~ start_ARG bold_italic_n end_ARG ,(2)

where 𝒚M𝒚superscript𝑀\boldsymbol{y}\in\mathbb{R}^{M}bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT is a vectorized sinogram, 𝒙N𝒙superscript𝑁\boldsymbol{x}\in\mathbb{R}^{N}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is a vectorized CT image, 𝒏~Mbold-~𝒏superscript𝑀\boldsymbol{\tilde{n}}\in\mathbb{R}^{M}overbold_~ start_ARG bold_italic_n end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT is the post-log noise corresponding to the pre-log noise 𝒏𝒏\boldsymbol{n}bold_italic_n, 𝑨M×N𝑨superscript𝑀𝑁\boldsymbol{A}\in\mathbb{R}^{M\times N}bold_italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT is a CT system matrix, and M𝑀Mitalic_M and N𝑁Nitalic_N are respectively the total numbers of measurements in the sinogram and pixels in the image domain. Because the limited-angle CT reconstruction problem is ill-posed, a standard approach to estimate the unknown image 𝒙𝒙\boldsymbol{x}bold_italic_x is to employ a regularization term:

minx12𝑨𝒙𝒚22+λR(𝒙).subscript𝑥12superscriptsubscriptnorm𝑨𝒙𝒚22𝜆𝑅𝒙\min_{x}\frac{1}{2}\|\boldsymbol{A}\boldsymbol{x}-\boldsymbol{y}\|_{2}^{2}+%\lambda R(\boldsymbol{x}).roman_min start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_A bold_italic_x - bold_italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ italic_R ( bold_italic_x ) .(3)

The first term of Eq.(3) is a data fidelity term based on the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm. The second term, R(𝒙)𝑅𝒙R(\boldsymbol{x})italic_R ( bold_italic_x ), is the regularizer that incorporates the prior information of the solution, and λ𝜆\lambdaitalic_λ is a trade-off parameter to balance these two terms.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (1)

2.2 Score-based Diffusion Model & SDEs

Score-based diffusion models are generative models in which samples from a probability density pdata(𝒙)subscript𝑝data𝒙p_{\text{data}}(\boldsymbol{x})italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ) can be produced via Langevin dynamics, a stochastic process used for sampling from probability distributions. In a score-based diffusion model, Langevin dynamics can be conceptualized as a methodical process for generating data points that are statistically consistent with the original dataset. It commences with an arbitrary data point and employs iterative updates to gradually align this point with the characteristics of the dataset. Central to this mechanism is the score function, 𝒙logp(𝒙)subscript𝒙𝑝𝒙\nabla_{\boldsymbol{x}}\log p(\boldsymbol{x})∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_x )[20], which represents the gradient of the logarithm of the probability density. This function provides directional guidance, indicating the path towards regions with higher data point probabilities. By adhering to this guidance, Langevin dynamics effectively moves the initial arbitrary point towards areas more densely populated by actual data points.

More specifically, given a fixed step size δ>0𝛿0\delta>0italic_δ > 0, and an initial value 𝒙0subscript𝒙0\boldsymbol{x}_{0}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (sampled from a prior distribution), the Langevin dynamics sampling process [20] can be modelled as:

𝒙t=𝒙t1+δ2𝒙logp(𝒙t1)+δNt,subscript𝒙𝑡subscript𝒙𝑡1𝛿2subscript𝒙𝑝subscript𝒙𝑡1𝛿subscript𝑁𝑡\boldsymbol{x}_{t}=\boldsymbol{x}_{t-1}+\frac{\delta}{2}\nabla_{\boldsymbol{x}%}\log p(\boldsymbol{x}_{t-1})+\sqrt{\delta}N_{t},bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + square-root start_ARG italic_δ end_ARG italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,(4)

where t𝑡titalic_t represent the sampling step, and Nt𝒩(0,𝑰)similar-tosubscript𝑁𝑡𝒩0𝑰N_{t}\sim\mathcal{N}(0,\boldsymbol{I})italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , bold_italic_I ) denotes random disturbance, where 𝑰𝑰\boldsymbol{I}bold_italic_I is the identity matrix. The distribution of 𝒙Tsubscript𝒙𝑇\boldsymbol{x}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT equals pdata(𝒙)subscript𝑝data𝒙p_{\text{data}}(\boldsymbol{x})italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ) when δ0𝛿0\delta\rightarrow 0italic_δ → 0 and the number of sampling steps T𝑇T\rightarrow\inftyitalic_T → ∞.From Eq.(4), to obtain samples from pdata(𝒙)subscript𝑝data𝒙p_{\text{data}}(\boldsymbol{x})italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ), we need to train a score network 𝒔𝜽(𝒙)subscript𝒔𝜽𝒙\boldsymbol{s}_{\boldsymbol{\theta}}(\boldsymbol{x})bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) to estimate 𝒙logpdata(𝒙)subscript𝒙subscript𝑝data𝒙\nabla_{\boldsymbol{x}}\log p_{\text{data}}(\boldsymbol{x})∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ). This is the basic idea of score-based diffusion model.

In both vanilla[19] and score-based[20] diffusion models, the diffusion and sampling processes are manually divided into T fixed steps. However, in reality, the diffusion and sampling procedures should not be deliberately segmented into steps. Instead, each can be considered as a continuous transformation process over time, characterized by stochastic differential equations (SDEs)[28]. This provides a generalized representation by utilizing continuous processes to describe diffusion models. The diffusion process using SDEs can be expressed as[28]:

d𝒙=𝒇(𝒙,t)dt+g(t)d𝒘,𝑑𝒙𝒇𝒙𝑡𝑑𝑡𝑔𝑡𝑑𝒘d\boldsymbol{x}=\boldsymbol{f}(\boldsymbol{x},t)\,dt+g(t)\,d\boldsymbol{w},italic_d bold_italic_x = bold_italic_f ( bold_italic_x , italic_t ) italic_d italic_t + italic_g ( italic_t ) italic_d bold_italic_w ,(5)

where 𝒇(𝒙,t)𝒇𝒙𝑡\boldsymbol{f}(\boldsymbol{x},t)bold_italic_f ( bold_italic_x , italic_t ) is called drift coefficient and represents the deterministic process, g(t)𝑔𝑡g(t)italic_g ( italic_t ) is the diffusion coefficient that introduces the stochastic process, and d𝒘𝑑𝒘d\boldsymbol{w}italic_d bold_italic_w represents the derivative of Wiener Process and introduces perturbations at any point in time and induces uncertainty. 𝒇(𝒙,t)𝒇𝒙𝑡\boldsymbol{f}(\boldsymbol{x},t)bold_italic_f ( bold_italic_x , italic_t ) is a function of the data samples 𝒙𝒙\boldsymbol{x}bold_italic_x and time t𝑡titalic_t, and is deterministic. Both terms in Eq.(5) are essential to score-based diffusion models: one drives the system towards the high-density distribution of the data, and the other introduces randomness to ensure diversity among the generated samples. We note that t𝑡titalic_t in Eq.(5) represent a continuous time variable, whereas it is a discrete time variable in Eq.(4). We disambiguate this notation only when necessary.

Among the possible choices of 𝒇𝒇\boldsymbol{f}bold_italic_f and g𝑔gitalic_g, we choose the variance-exploding SDE (VE-SDE)[28]. Let 𝒇(𝒙,t)=0𝒇𝒙𝑡0\boldsymbol{f}(\boldsymbol{x},t)=0bold_italic_f ( bold_italic_x , italic_t ) = 0 and g(t)=d[σ2(t)]dt𝑔𝑡𝑑delimited-[]superscript𝜎2𝑡𝑑𝑡g(t)=\sqrt{\frac{d[\sigma^{2}(t)]}{dt}}italic_g ( italic_t ) = square-root start_ARG divide start_ARG italic_d [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ] end_ARG start_ARG italic_d italic_t end_ARG end_ARG. We can simplify the forward process into a Wiener process:

d𝒙=d[σ2(t)]dtd𝒘,𝑑𝒙𝑑delimited-[]superscript𝜎2𝑡𝑑𝑡𝑑𝒘d\bm{x}=\sqrt{\frac{d[\sigma^{2}(t)]}{dt}}\,d\bm{w},italic_d bold_italic_x = square-root start_ARG divide start_ARG italic_d [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ] end_ARG start_ARG italic_d italic_t end_ARG end_ARG italic_d bold_italic_w ,(6)

where σ2(t)superscript𝜎2𝑡\sigma^{2}(t)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) refers to the time-dependent variance function. Because σ2(t)superscript𝜎2𝑡\sigma^{2}(t)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) is designed to follow a rapidly increasing variance schedule, its derivative is guaranteed to be non-negative.In addition to the forward diffusion process, we also need consider the reverse sampling process in which time t𝑡titalic_t regresses from 1 to 0. In this reverse process, the goal is to transform a simple Gaussian distribution 𝒩(0,𝑰)𝒩0𝑰\mathcal{N}(0,\boldsymbol{I})caligraphic_N ( 0 , bold_italic_I ) back into the desired data distribution pdata(𝒙)subscript𝑝data𝒙p_{\text{data}}(\boldsymbol{x})italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ). By applying the Anderson’s theorem[29] to (6), we have the reverse SDE:

d𝒙=d[σ2(t)]dt𝒙logpt(𝒙)dt+d[σ2(t)]dtd𝒘¯.𝑑𝒙𝑑delimited-[]superscript𝜎2𝑡𝑑𝑡subscript𝒙subscript𝑝𝑡𝒙𝑑𝑡𝑑delimited-[]superscript𝜎2𝑡𝑑𝑡𝑑¯𝒘d\boldsymbol{x}=-\frac{d\left[\sigma^{2}(t)\right]}{dt}\nabla_{\boldsymbol{x}}%\log{p}_{t}\left(\boldsymbol{x}\right)dt+\sqrt{\frac{d\left[\sigma^{2}(t)%\right]}{dt}}d\overline{\boldsymbol{w}}.italic_d bold_italic_x = - divide start_ARG italic_d [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ] end_ARG start_ARG italic_d italic_t end_ARG ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) italic_d italic_t + square-root start_ARG divide start_ARG italic_d [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ] end_ARG start_ARG italic_d italic_t end_ARG end_ARG italic_d over¯ start_ARG bold_italic_w end_ARG .(7)

In Eq.(7), 𝒘¯¯𝒘\overline{\boldsymbol{w}}over¯ start_ARG bold_italic_w end_ARG denotes the reverse Wiener process. It is essential to observe that the drift term is defined by the score function 𝒙logpt(𝒙)subscript𝒙subscript𝑝𝑡𝒙\nabla_{\boldsymbol{x}}\log{p}_{t}\left(\boldsymbol{x}\right)∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ). Hence, the problem of reverse sampling from the data distribution is transformed into a problem of score estimation. However, direct estimation of the score function of high-dimensional distributions is challenging. The denoising score matching (DSM) method allows us to estimate the score function using a denoising function. Following the DSM framework, we introduce a denoising function:

minθ𝔼t,𝒙(t)[λ(t)𝒔𝜽(𝒙(t),t)𝒙tlogp(𝒙(t)|𝒙(0))22].\min_{\theta}\mathbb{E}_{t,\boldsymbol{x}(t)}\bigl{[}\lambda(t)\,\left\|%\boldsymbol{s}_{\boldsymbol{\theta}}(\boldsymbol{x}(t),t)-\nabla_{\boldsymbol{%x}_{t}}\log p(\boldsymbol{x}(t)|\boldsymbol{x}(0))\right\|_{2}^{2}\bigr{]}.roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_t , bold_italic_x ( italic_t ) end_POSTSUBSCRIPT [ italic_λ ( italic_t ) ∥ bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) , italic_t ) - ∇ start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_x ( italic_t ) | bold_italic_x ( 0 ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] .(8)

In this loss function, 𝒔𝜽(𝒙(t),t)subscript𝒔𝜽𝒙𝑡𝑡\boldsymbol{s}_{\boldsymbol{\theta}}(\boldsymbol{x}(t),t)bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) , italic_t ) is the score function estimated by a model (i.e., U-Net) parameterized by 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, 𝒙tlogp(𝒙(t)|𝒙(0))subscript𝒙𝑡𝑝conditional𝒙𝑡𝒙0\nabla{\boldsymbol{x}_{t}}\log p(\boldsymbol{x}(t)|\boldsymbol{x}(0))∇ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_x ( italic_t ) | bold_italic_x ( 0 ) ) is the true score function of the data distribution, and λ(t)𝜆𝑡\lambda(t)italic_λ ( italic_t ) is a weighting factor that may vary depending on the time t𝑡titalic_t.The denoising function has the desirable property that its expectation equals the score of the perturbed data at the input location. This makes it possible to replace the score function in the reverse SDE with the denoising function. This property implies that we can approximate the reverse SDE with the following equation:

d𝒙=d[σ2(t)]dt𝒔𝜽(𝒙,t)dt+d[σ2(t)]dtd𝒘¯.𝑑𝒙𝑑delimited-[]superscript𝜎2𝑡𝑑𝑡subscript𝒔𝜽𝒙𝑡𝑑𝑡𝑑delimited-[]superscript𝜎2𝑡𝑑𝑡𝑑¯𝒘d\boldsymbol{x}=-\frac{d\left[\sigma^{2}(t)\right]}{dt}\,\boldsymbol{s}_{%\boldsymbol{\theta}}\left(\boldsymbol{x},t\right)dt+\sqrt{\frac{d\left[\sigma^%{2}(t)\right]}{dt}}\,d\overline{\boldsymbol{w}}.italic_d bold_italic_x = - divide start_ARG italic_d [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ] end_ARG start_ARG italic_d italic_t end_ARG bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , italic_t ) italic_d italic_t + square-root start_ARG divide start_ARG italic_d [ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) ] end_ARG start_ARG italic_d italic_t end_ARG end_ARG italic_d over¯ start_ARG bold_italic_w end_ARG .(9)

By doing so, we simplify the reverse process by directly estimating the denoising function, instead of the complex score function. A neural network can be trained to model 𝒔𝜽(𝒙,t)subscript𝒔𝜽𝒙𝑡\boldsymbol{s}_{\boldsymbol{\theta}}\left(\boldsymbol{x},t\right)bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , italic_t ), with the training objective being a regularized version of the DSM loss function.In practice, we can leverage the variance of the VE-SDE to control the amount of noise injection. Hence, the reverse process is trained to denoise observations at different noise levels, resulting in a more robust score estimation. The model’s ability to handle different noise levels leads to an improvement in the quality of samples drawn from the learned data distribution.

In conclusion, the approach based on VE-SDE and DSM provides a robust and effective way to estimate the score function of complex and high-dimensional distributions, enabling the transformation of a simple Gaussian distribution into desired data distribution. The predictor-corrector (PC) sampler[28] can be used to solve Eq.(9) by discretization of the time interval [0, 1] into I𝐼\mathit{I}italic_I steps. In our case I𝐼\mathit{I}italic_I is set to 1000.

2.3 Conditional predictor-corrector sampler

Unconditional generation often explores upper limits in terms of effects, while conditional generation mainly focuses on application-oriented tasks because it allows us to control output results according to specific requirements (i.e., generation of high-quality CT images using limited-angle projections). To address inverse problems in CT image reconstruction, conditional generation is indispensable. Generally speaking, there are two types of strategies for conditional control generation: classifier-guidance[21] and classifier-free[30]. For CT reconstruction, the classifier-guidance approach allows the model to leverage an angular classifier that provides probabilistic conditioning guidance based on the view angle of the limited projections. This enables tighter conditioning on the measurement data, and is less computationally expensive than training a diffusion model. Hence, it is more convenient for subsequent researchers to fine-tune a pretrained model based on rich classifier gradients, thereby obviating the need to train a new model.

For the case of limited-angle CT reconstruction, we wish to sample from a posterior distribution. This can be defined as the conditional distribution of 𝒙𝒙\boldsymbol{x}bold_italic_x given the measurement 𝒚𝒚\boldsymbol{y}bold_italic_y, i.e., p(𝒙|𝒚)𝑝conditional𝒙𝒚p(\boldsymbol{x}|\boldsymbol{y})italic_p ( bold_italic_x | bold_italic_y ). According to Bayes’ theorem, the score function of the posterior distribution can be computed as:

𝒙logpt(𝒙|𝒚)=𝒙logpt(𝒙)+𝒙logpt(𝒚|𝒙).subscript𝒙subscript𝑝𝑡conditional𝒙𝒚subscript𝒙subscript𝑝𝑡𝒙subscript𝒙subscript𝑝𝑡conditional𝒚𝒙\nabla_{\boldsymbol{x}}\log p_{t}(\boldsymbol{x}|\boldsymbol{y})=\nabla_{%\boldsymbol{x}}\log p_{t}(\boldsymbol{x})+\nabla_{\boldsymbol{x}}\log p_{t}(%\boldsymbol{y}|\boldsymbol{x}).∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x | bold_italic_y ) = ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) + ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_y | bold_italic_x ) .(10)

Here, the likelihood enforces the data consistency, and thereby induces samples that satisfy 𝒚=𝑨𝒙𝒚𝑨𝒙\boldsymbol{y}=\boldsymbol{A}\boldsymbol{x}bold_italic_y = bold_italic_A bold_italic_x. The calculation of Eq.(10) is decomposed into two stages: the initial stage consists of a denoising step, where the previously acquired data prior is applied to denoise the image, so we can simply use the pre-trained score function 𝒔𝜽subscript𝒔𝜽\boldsymbol{s}_{\boldsymbol{\theta}}bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT. The second stage is projection onto the measurement subspace.Within a discrete framework, this process is:

𝒙i1solve(𝒙i1,𝒔𝜽),superscriptsubscript𝒙𝑖1solvesubscript𝒙𝑖1subscript𝒔𝜽\boldsymbol{x}_{i-1}^{\prime}\leftarrow\text{solve}(\boldsymbol{x}_{i-1},%\boldsymbol{s}_{\boldsymbol{\theta}}),bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← solve ( bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) ,(11)
𝒙iP{x|Ax=y}(𝒙i1),subscript𝒙𝑖𝑃conditional-set𝑥𝐴𝑥𝑦superscriptsubscript𝒙𝑖1\boldsymbol{x}_{i}\leftarrow P\{x|Ax=y\}(\boldsymbol{x}_{i-1}^{\prime}),bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_P { italic_x | italic_A italic_x = italic_y } ( bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ,(12)

where ”solve” represents a numerical solver of Eq. 9, and P𝑃Pitalic_P denotes the projection operator by following the Projection Onto Convex Sets(POCS) principle to satisfy the data constraint. In our case, we use the PC sampler as a solver for Eq.(11) and the PDHG as a data-consistency module to solve Eq.(12). It should be noted that we directly applied our PDHG optimization steps to the noisy variables in Eq.12 by following [26, 24, 31]. This works when we use sufficiently small step sizes to enable correction along the way. The overall framework of our proposed method appears in Fig. 1. We use a trained network 𝒔𝜽subscript𝒔𝜽\boldsymbol{s}_{\boldsymbol{\theta}}bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT to denoise and use PDHG to enforce the data consistency.

The PC sampler, inspired by the predictor-corrector method in numerical solutions for differential equations, is utilized in the reverse process to generate new samples.Unlike traditional SDE samplers such as Euler-Maruyama (EM) and Runge-Kutta (RK) [32], this sampling strategy enhances the final solutions by incorporating additional information. The process involves predicting the next state in the diffusion process (the “predictor” step), and then correcting this prediction based on the actual observed data (the “corrector” step). In summary, the PC sampler alternates between two steps:

  1. 1.

    Predictor step: Given the current state of the sample, it predicts the next state by taking a step in the direction of the gradient of the log probability of the data. This is equivalent to performing a step of gradient ascent on the log probability.

  2. 2.

    Corrector step: It corrects the prediction by adding a small amount of noise. This noise is sampled from a Gaussian distribution whose standard deviation is determined by the diffusion process.

By alternating between the above two steps, the PC sampler gradually transforms a sample in the prior distribution into a sample in the data distribution, enabling the score-based diffusion model to generate new samples that closely resemble the training data.

3 Methodology

3.1 Motivation

While the utilization of a score-based diffusion model is beneficial, it is important to acknowledge its potential limitations. Specifically, there is a risk of generating false structures or artificial elements that do not correspond to the true underlying structures in the image. This is a consequence of employing a model that is entirely data-driven. Such a model relies heavily on the quality and representativeness of the input data.

To overcome this issue, we incorporate physical constraints. Physical constraints are derived from established theories or empirical observations that guide us on how the data should behave, providing an additional layer of control and validation on the output of the diffusion model. This can help to correct for aberrations such as false structures, aligning the output more closely to the known physical truths and thus ensuring a more accurate and robust image representation. The combination of data prior and model has proven an accurate and stable method in image reconstruction applications[33]. To combine model priors from the physics models and data priors from the diffusion model, the projection operator P𝑃Pitalic_P in Eq.(12) is replaced by the PDHG update method, and the Fourier fusion strategy is used between diffusion model denoised result and limited-angle CT image.

3.1.1 PDHG

The rationale for employing the PDHG methodology is well-known in medical image processing, and the minimization process in Eq.(3) can be viewed as an optimization task. Because the objective function is usually is non-differentiable, we are unable to utilize the gradient descent approach for optimization. An alternative strategy is to apply non-smooth convex optimization techniques. Proximal methods can work directly with non-smooth objective functions. In this case, a proximal step takes the place of the gradient step. In a proximal primal-dual scheme, an additional dual variable that is in the range of the operator is introduced, and the primal and dual variables are updated alternately. For the optimization problem below, three vector spaces are used: I𝐼Iitalic_I is the space of 2-dimensional (2D) discrete images, D𝐷Ditalic_D is the space of the CT projection data, and V𝑉Vitalic_V is the space of spatial-vector-valued image arrays (V=I2𝑉superscript𝐼2V=I^{2}italic_V = italic_I start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT). We define the indicator function:

δBox(α)(x){0xαx>α,subscript𝛿Box𝛼𝑥cases0subscriptnorm𝑥𝛼subscriptnorm𝑥𝛼\delta_{\textrm{Box}(\alpha)}(x)\equiv\begin{cases}0&\|x\|_{\infty}\leq\alpha%\\\infty&\|x\|_{\infty}>\alpha\end{cases},italic_δ start_POSTSUBSCRIPT Box ( italic_α ) end_POSTSUBSCRIPT ( italic_x ) ≡ { start_ROW start_CELL 0 end_CELL start_CELL ∥ italic_x ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_α end_CELL end_ROW start_ROW start_CELL ∞ end_CELL start_CELL ∥ italic_x ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT > italic_α end_CELL end_ROW ,(13)

where \|\cdot\|_{\infty}∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is an \infty-norm used to select the maximum absolute value of the components of a vector. The set Box(α)Box𝛼\text{Box}(\alpha)Box ( italic_α ) contains vectors in which no component is greater than α𝛼\alphaitalic_α. This means that all components of a vector in this set lie in the interval [α,α]𝛼𝛼[-\alpha,\alpha][ - italic_α , italic_α ]. In 2D, Box(α)Box𝛼\text{Box}(\alpha)Box ( italic_α ) is a square centered at the origin with side length 2α2𝛼2\alpha2 italic_α; for any point (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) inside this square, both x𝑥xitalic_x and y𝑦yitalic_y must lie within the interval [α,α]𝛼𝛼[-\alpha,\alpha][ - italic_α , italic_α ]. Additionally, in the context of a given vector space X𝑋Xitalic_X, the notations 𝟎Xsubscript0𝑋\mathbf{0}_{X}bold_0 start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT and 𝟏Xsubscript1𝑋\mathbf{1}_{X}bold_1 start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT are employed to denote vectors in X𝑋Xitalic_X whose components are all set to 0 and 1, respectively. For instance, in a 2D space, 𝟎Xsubscript0𝑋\mathbf{0}_{X}bold_0 start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT is [0,0]00[0,0][ 0 , 0 ] and 𝟏Xsubscript1𝑋\mathbf{1}_{X}bold_1 start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT is [1,1]11[1,1][ 1 , 1 ].

3.1.2 Fourier Fusion

Fourier fusion offers a promising solution to solving limited-angle artifacts by leveraging the complementary strengths of diffusion models and Fourier domain processing. Diffusion models are good at denoising and generating high-quality images from noisy or incomplete data. By integrating Fourier fusion into the framework, we aim to enhance spatial frequency components that are underrepresented or distorted due to the generative process from diffusion model, thus improving the overall resolution and detail in the reconstructed image. This approach utilizes a region combination strategy in the Fourier domain, where specific frequency regions from each source are selectively merged to optimize image quality.

To ensure coherence and to maximize the use of high-quality information from the limited-angle CT image (𝑿LACTsubscript𝑿LACT\boldsymbol{X}_{\text{LACT}}bold_italic_X start_POSTSUBSCRIPT LACT end_POSTSUBSCRIPT which obtained using FBP algorithm), a unified mask M𝑀Mitalic_M is adopted. M𝑀Mitalic_M is constructed based on the centered Fourier transform \mathcal{F}caligraphic_F (first applying the Fourier transform and then centering the spectrum) of the limited-angle CT image. This mask identifies frequency components that are less affected by the limitations of the set of acquisition angle and set those areas to zero. Those frequency components represent reliable information about the object’s structure. The final step in the reconstruction process involves the inverse centered Fourier transform 1superscript1\mathcal{F}^{-1}caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (first applying the inverse shift and then applying the inverse transform) of combined image to convert it from the frequency domain to the spatial domain, thereby producing the fused image.

3.2 Algorithmic Steps

A well-known primal-dual scheme is the PDHG algorithm, also referred to as the Chambolle-Pock algorithm[34]. It applies to a general form of the primal minimization and dual maximization:

minxX(F(K(x))+G(x)),𝑥𝑋min𝐹𝐾𝑥𝐺𝑥\underset{x\in X}{\text{min}}\left(F(K(x))+G(x)\right),start_UNDERACCENT italic_x ∈ italic_X end_UNDERACCENT start_ARG min end_ARG ( italic_F ( italic_K ( italic_x ) ) + italic_G ( italic_x ) ) ,(14)
maxyY(F(y)G(KTy)).𝑦𝑌maxsuperscript𝐹𝑦superscript𝐺superscript𝐾𝑇𝑦\underset{y\in Y}{\text{max}}\left(-F^{*}(y)-G^{*}(-K^{T}y)\right).start_UNDERACCENT italic_y ∈ italic_Y end_UNDERACCENT start_ARG max end_ARG ( - italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y ) - italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( - italic_K start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y ) ) .(15)

In this context, x𝑥xitalic_x and y𝑦yitalic_y denote vectors of finite dimensionality within the spaces X𝑋Xitalic_X and Y𝑌Yitalic_Y, respectively. The symbol K𝐾Kitalic_K is a linear operator mapping from X𝑋Xitalic_X to Y𝑌Yitalic_Y. The functions G𝐺Gitalic_G and F𝐹Fitalic_F, which may not be smooth, represent convex mappings from the respective spaces X𝑋Xitalic_X and Y𝑌Yitalic_Y to a set of non-negative real numbers. The superscript “\ast” represents convex conjugation.

Particularly, in the context of Eq.(3), we apply an isotropic TV semi-norm as a regularization term denoted as R(x)=(|x|)1𝑅𝑥subscriptnorm𝑥1R(x)=\|(\nabla|x|)\|_{1}italic_R ( italic_x ) = ∥ ( ∇ | italic_x | ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. This norm has proven useful for edge-preserving regularization in the PDHG optimization algorithm[35]. The spatial-vector depiction x𝑥\nabla x∇ italic_x corresponds to a discrete approximation of the image gradient, residing in the vector space V𝑉Vitalic_V. As reported in [35], the objective function of PDHG to solve Eq.(3) can be derived as the following formulas:

F(x,z)=F1(x)+F2(z),𝐹𝑥𝑧subscript𝐹1𝑥subscript𝐹2𝑧\displaystyle F(x,z)=F_{1}(x)+F_{2}(z),italic_F ( italic_x , italic_z ) = italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) + italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_z ) ,(16)
F1(x)=12my22,F2(z)=λ(|z|)1,formulae-sequencesubscript𝐹1𝑥12superscriptsubscriptnorm𝑚𝑦22subscript𝐹2𝑧𝜆subscriptnorm𝑧1\displaystyle F_{1}(x)=\frac{1}{2}\|m-y\|_{2}^{2},\quad F_{2}(z)=\lambda\|(|z|%)\|_{1},italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_m - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_z ) = italic_λ ∥ ( | italic_z | ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,
G(x)=0,𝐺𝑥0\displaystyle G(x)=0,italic_G ( italic_x ) = 0 ,
m=Ax,z=x,formulae-sequence𝑚𝐴𝑥𝑧𝑥\displaystyle m=Ax,z=\nabla x,italic_m = italic_A italic_x , italic_z = ∇ italic_x ,
K=(A),𝐾𝐴\displaystyle K=\left(\begin{array}[]{l}A\\\nabla\end{array}\right),italic_K = ( start_ARRAY start_ROW start_CELL italic_A end_CELL end_ROW start_ROW start_CELL ∇ end_CELL end_ROW end_ARRAY ) ,

where xI𝑥𝐼x\in Iitalic_x ∈ italic_I, yD𝑦𝐷y\in Ditalic_y ∈ italic_D and zV𝑧𝑉z\in Vitalic_z ∈ italic_V. To obtain the convex conjugate of F𝐹Fitalic_F, variables p𝑝pitalic_p, q𝑞qitalic_q and r𝑟ritalic_r are introduced as auxiliary components to play a crucial role in the computation of the Lagrange multipliers. Then, we have the convex conjugate of F𝐹Fitalic_F and G𝐺Gitalic_G. Calculating the convex conjugate is essential for transitioning between the primal and dual spaces, thereby facilitating the primal-dual updates in the PDHG algorithm:

F(p,q)=12p22+p,yD+δBox(λ)(|q|),superscript𝐹𝑝𝑞12superscriptsubscriptnorm𝑝22subscript𝑝𝑦𝐷subscript𝛿Box𝜆𝑞F^{*}(p,q)=\frac{1}{2}\|p\|_{2}^{2}+\langle p,y\rangle_{D}+\delta_{\mathrm{Box%}(\lambda)}(|q|),italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_p , italic_q ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_p ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⟨ italic_p , italic_y ⟩ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT roman_Box ( italic_λ ) end_POSTSUBSCRIPT ( | italic_q | ) ,(17)
G(r)=δ𝟎I(r),superscript𝐺𝑟subscript𝛿subscript0𝐼𝑟G^{*}(r)=\delta_{\mathbf{0}_{I}}(r),italic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_r ) = italic_δ start_POSTSUBSCRIPT bold_0 start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r ) ,(18)

where pD𝑝𝐷p\in Ditalic_p ∈ italic_D, qV𝑞𝑉q\in Vitalic_q ∈ italic_V and rI𝑟𝐼r\in Iitalic_r ∈ italic_I.

Finally, we obtain the proximal mapping of the PDHG algorithm for Eq.(3):

proxσ[F](y,z)=(mσy1+σ,λzmax(λ𝟏I,|z|)),subscriptprox𝜎superscript𝐹𝑦𝑧𝑚𝜎𝑦1𝜎𝜆𝑧𝜆subscript1𝐼𝑧\operatorname{prox}_{\sigma}\left[F^{*}\right](y,z)=\left(\frac{m-\sigma y}{1+%\sigma},\frac{\lambda z}{\max\left(\lambda\mathbf{1}_{I},|z|\right)}\right),roman_prox start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT [ italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] ( italic_y , italic_z ) = ( divide start_ARG italic_m - italic_σ italic_y end_ARG start_ARG 1 + italic_σ end_ARG , divide start_ARG italic_λ italic_z end_ARG start_ARG roman_max ( italic_λ bold_1 start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , | italic_z | ) end_ARG ) ,(19)

where σ𝜎\sigmaitalic_σ is the step size to control the update of the variables in each iteration, and 𝟏Isubscript1𝐼\mathbf{1}_{I}bold_1 start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT is an image with all pixels set to 1. The proximal mapping function in this context serves as a mechanism to iteratively update the dual variables y𝑦yitalic_y and z𝑧zitalic_z, thereby influencing the optimization of the primal variable x𝑥xitalic_x in the PDHG algorithm, ensuring convergence to an optimal solution that balances data fidelity and regularization.

Consequently, we can establish our PSDM algorithm as Algorithm 1. It should be noted that there are two iterative steps [see Eqs.(20) and (22)]: 1) The denoising step of SDE indexed by i𝑖iitalic_i , and 2) PDHG iteration indexed by n𝑛nitalic_n. In our case n𝑛nitalic_n is set to 30. We apply the Fourier fusion and then in the second iteration step and initialization the 𝒙𝒙\boldsymbol{x}bold_italic_x = 𝒙isubscriptsuperscript𝒙𝑖\boldsymbol{x}^{\prime}_{i}bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

𝒙DNsolve(𝒙i,𝒔𝜽),subscript𝒙DNsolvesubscript𝒙𝑖subscript𝒔𝜽\boldsymbol{x}_{\text{DN}}\leftarrow\text{solve}(\boldsymbol{x}_{i},%\boldsymbol{s}_{\boldsymbol{\theta}}),bold_italic_x start_POSTSUBSCRIPT DN end_POSTSUBSCRIPT ← solve ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) ,(20)
𝒙i1{M[(𝒙DN)]+(𝑿LACT)},subscriptsuperscript𝒙𝑖superscript1𝑀delimited-[]subscript𝒙DNsubscript𝑿LACT\boldsymbol{x}^{\prime}_{i}\leftarrow\mathcal{F}^{-1}\left\{M\left[\mathcal{F}%(\boldsymbol{x}_{\text{DN}})\right]+\mathcal{F}(\boldsymbol{X}_{\text{LACT}})%\right\},bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_M [ caligraphic_F ( bold_italic_x start_POSTSUBSCRIPT DN end_POSTSUBSCRIPT ) ] + caligraphic_F ( bold_italic_X start_POSTSUBSCRIPT LACT end_POSTSUBSCRIPT ) } ,(21)
𝒙i1argmin𝒙12𝒚𝑨𝒙22+(|𝒙|)1.subscript𝒙𝑖1𝒙argmin12superscriptsubscriptnorm𝒚𝑨𝒙22subscriptnorm𝒙1\boldsymbol{x}_{i-1}\leftarrow\underset{\boldsymbol{x}}{\operatorname{argmin}}%\frac{1}{2}\left\|\boldsymbol{y}-\boldsymbol{A}\boldsymbol{x}\right\|_{2}^{2}+%\|(|\nabla\boldsymbol{x}|)\|_{1}.bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ← underbold_italic_x start_ARG roman_argmin end_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_italic_y - bold_italic_A bold_italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( | ∇ bold_italic_x | ) ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .(22)

The purpose of initial iteration is to transform the dataset into the learned data distribution. The succeeding iteration can be interpreted as a data fidelity term to ensure the congruity between the generated data and the provided input. Specifically, Fourier fusion is strategically applied between steps 400 to 800 of the sampling process. This timing is chosen because the initial image quality at the very beginning is too poor for Fourier fusion to be effective. Additionally, introducing Fourier fusion too late in the process, particularly at the final stages, can potentially negatively impact the quality of the final reconstructed image.

1:𝒙logpt(𝒙)subscript𝒙subscript𝑝𝑡𝒙\nabla_{\boldsymbol{x}}\log p_{t}(\boldsymbol{x})∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ), N𝑁Nitalic_N, I𝐼Iitalic_I, λ𝜆\lambdaitalic_λ

2:Initialization xN𝒩(0,σT2I)similar-tosubscript𝑥𝑁𝒩0superscriptsubscript𝜎𝑇2𝐼x_{N}\sim\mathcal{N}(0,\sigma_{T}^{2}I)italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ), L(𝑨,)2𝐿subscriptnorm𝑨2L\leftarrow\|(\boldsymbol{A},\nabla)\|_{2}italic_L ← ∥ ( bold_italic_A , ∇ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, τ1L𝜏1𝐿\tau\leftarrow\frac{1}{L}italic_τ ← divide start_ARG 1 end_ARG start_ARG italic_L end_ARG, σ1L𝜎1𝐿\sigma\leftarrow\frac{1}{L}italic_σ ← divide start_ARG 1 end_ARG start_ARG italic_L end_ARG, θ1𝜃1\theta\leftarrow 1italic_θ ← 1;

3:fori=I1𝑖𝐼1i=I-1italic_i = italic_I - 1 to 00do

4:𝒙DNsolve(xi,𝒔𝜽)subscript𝒙DNsolvesubscript𝑥𝑖subscript𝒔𝜽\boldsymbol{x}_{\text{DN}}\leftarrow\text{solve}(x_{i},\boldsymbol{s}_{%\boldsymbol{\theta}})bold_italic_x start_POSTSUBSCRIPT DN end_POSTSUBSCRIPT ← solve ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) \triangleright SDE Denoising

5:𝒙i1{M[(𝒙DN)]+(𝑿LACT)}subscriptsuperscript𝒙𝑖superscript1𝑀delimited-[]subscript𝒙DNsubscript𝑿LACT\boldsymbol{x}^{\prime}_{i}\leftarrow\mathcal{F}^{-1}\left\{M\left[\mathcal{F}%(\boldsymbol{x}_{\text{DN}})\right]+\mathcal{F}(\boldsymbol{X}_{\text{LACT}})\right\}bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_M [ caligraphic_F ( bold_italic_x start_POSTSUBSCRIPT DN end_POSTSUBSCRIPT ) ] + caligraphic_F ( bold_italic_X start_POSTSUBSCRIPT LACT end_POSTSUBSCRIPT ) } \triangleright FF

6:Initialization u0,p0,q0,n0,xn0formulae-sequencesubscript𝑢0subscript𝑝0subscript𝑞0𝑛0subscript𝑥𝑛0u_{0},p_{0},q_{0},n\leftarrow 0,x_{n}\leftarrow 0italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_n ← 0 , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← 0, 𝒙n𝒙isubscriptsuperscript𝒙𝑛subscriptsuperscript𝒙𝑖\boldsymbol{x}^{\prime}_{n}\leftarrow\boldsymbol{x}^{\prime}_{i}bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ← bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

7:repeat

8:pn+1(pn+σ(𝑨𝒙ny))/(1+σ)subscript𝑝𝑛1subscript𝑝𝑛𝜎𝑨subscriptsuperscript𝒙𝑛𝑦1𝜎p_{n+1}\leftarrow(p_{n}+\sigma(\boldsymbol{A}\boldsymbol{x}^{\prime}_{n}-y))/{%(1+\sigma)}italic_p start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ← ( italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_σ ( bold_italic_A bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_y ) ) / ( 1 + italic_σ )

9:qn+1λ(qn+σ𝒙n)/max(λ𝟏I,|qn+σ𝒙n|)subscript𝑞𝑛1𝜆subscript𝑞𝑛𝜎subscriptsuperscript𝒙𝑛𝜆subscript1𝐼subscript𝑞𝑛𝜎subscriptsuperscript𝒙𝑛q_{n+1}\leftarrow\lambda(q_{n}+\sigma\nabla\boldsymbol{x}^{\prime}_{n})/{\max(%\lambda\mathbf{1}_{I},|q_{n}+\sigma\nabla\boldsymbol{x}^{\prime}_{n}|)}italic_q start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ← italic_λ ( italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_σ ∇ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) / roman_max ( italic_λ bold_1 start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , | italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_σ ∇ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | )

10:𝒙n+1𝒙nτ𝑨Tpn+1+τdivqn+1subscript𝒙𝑛1subscript𝒙𝑛𝜏superscript𝑨𝑇subscript𝑝𝑛1𝜏𝑑𝑖𝑣subscript𝑞𝑛1\boldsymbol{x}_{n+1}\leftarrow\boldsymbol{x}_{n}-\tau\boldsymbol{A}^{T}p_{n+1}%+\tau divq_{n+1}bold_italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ← bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_τ bold_italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_τ italic_d italic_i italic_v italic_q start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT

11:𝒙n+1𝒙n+1+θ(𝒙n+1𝒙n)subscriptsuperscript𝒙𝑛1subscript𝒙𝑛1𝜃subscript𝒙𝑛1subscript𝒙𝑛\boldsymbol{x}^{\prime}_{n+1}\leftarrow\boldsymbol{x}_{n+1}+\theta(\boldsymbol%{x}_{n+1}-\boldsymbol{x}_{n})bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ← bold_italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_θ ( bold_italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )

12:nn+1𝑛𝑛1n\leftarrow n+1italic_n ← italic_n + 1

13:untilnN𝑛𝑁n\geq Nitalic_n ≥ italic_N

14:xixn+1subscript𝑥𝑖subscriptsuperscript𝑥𝑛1x_{i}\leftarrow x^{\prime}_{n+1}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT \triangleright After N𝑁Nitalic_N PDHG Iterations

15:return x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (2)

4 Experiments and results

In this section, we first provide an overview of our cardiac CT simulation dataset and outline the essential steps for the training and evaluation of the proposed method. We then compare our results with those obtained from Diffusion-MBIR, another state-of-the-art diffusion model used for CT reconstruction. We also conduct a comparison against several fully-supervised methods. Specifically, we use TomoGAN (a GAN-based generative model) and FBP-ConvNet as our baselines. Finally, we include PDHG-TV, which uses a regularization function, in our study.

4.1 Numerical simulations

4.1.1 Dataset preparation and network training

We use the 4D Extended Cardiac-Torso (XCAT) phantom version 2 to generate cardiac CT phantom data. XCAT is a multimodal phantom developed at the Duke University [36, 37]. XCAT can produce a 4D attenuation coefficient phantom to simulate a patient with a beating heart to generate a dynamic cardiac CT scan. We empirically calibrate the tube current-time product (mAs) to ensure that the noise level of the simulated CT images matches that of the real cardiac CT images. In our study, we set mAs per projection to 0.25. And we conceptualize a single cardiac cycle as a duration of one second. Within this framework, we divide each cardiac cycle into 200 discrete phases and sample at equal time intervals to ensure precise temporal resolution. Utilizing these parameters, we generate simulated images for 10 individuals and reconstruct images at phases ranging from 20% to 80% with an interval of 12%. We employ an equiangular cone beam projection geometry, which is based on a model of a GE CT scanner. The detector array is comprised of 835 units, while the complete scan encompasses 984 views. The field-of-view (FOV) diameter covers an expansive 50 × 50 cm2superscriptcm2\mathrm{cm^{2}}roman_cm start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The distance between the X-ray source and origin is set to 53.852 cmcm\mathrm{cm}roman_cm. In this study, a cross-validation approach is employed to train our model based on the noise conditional score network plus plus(NCSNPP) [28]. The database comprises 10 patient datasets, each containing 2D slices of 256 × 256 pixels, representing a total of 3,714 slices. In each fold of the cross-validation, nine patient datasets (3,362 slices) are used to train the model, while the remaining patient dataset (352 slices) is used for testing. This process is repeated 10 times, with each patient dataset being used exactly once as the testing set. Specifically, in the simulation results section, phase100 in patient 153 is labeled as Case 1, phase112 is labeled as Case 2.

For inference, we adopt the PC sampling strategy with I𝐼Iitalic_I=1,000. In every iteration, data consistency is maintained through the application of the PDHG algorithm, utilizing a sequence of 30 iterations. The experiments are conducted on a high-performance workstation equipped with an Intel i9-9920X CPU operating at 3.50 GHz and an NVIDIA GeForce 2080TI graphics processing unit (GPU). The proposed method is implemented in the PyTorch framework and optimized using the Adam optimizer. The batch size is set as 1, and the network is trained for 200 epochs. It is noteworthy that the Operator Discretization Library (ODL)[38] is utilized to implement the PDHG algorithm and the Astra toolbox is adopted to facilitate fast forward and backward projections for image reconstruction in X-ray tomography.

4.1.2 Simulation Results

Fig. 2 shows the representative reconstruction results for patient 153 across phases 100 and 112, utilizing different methods for 120-degree limited-angle reconstruction. The first and third rows display the reconstruction results, while the second and fourth rows provide the corresponding SSIM map for comparative analysis. For the 120-degree condition, most of the methods achieve satisfactory results, except for FBP and PDHG-TV, which are affected by strong limited-angle artifacts.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (3)
Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (4)

FBPConvNet and TomoGAN show compromised edge quality along oblique directions, while the Diffusion-MBIR and PSDM demonstrate superior edge preservation. However, the Diffusion-MBIR images lack the finer details that are perceivable in PSDM images.

For further comparison, two regions of interest (ROIs) are extracted from Cases 1 and Case 2, and subsequently magnified as in Fig. 3. Upon close examination of the structures, specifically those indicated by the red arrows, it is evident that the images are subject to blur and distortion when they are processed by PDHG-TV, FBPConvNet, and TomoGAN. Conversely, the Diffusion-MBIR and PSDM yield superior image quality. However, it should be noted that Diffusion-MBIR images are not entirely devoid of imperfections, as they continue to retain some noise and exhibit minor distortions at the locations indicated by the red arrows. Furthermore, when looking closely at the area indicated by the red circles, PSDM provides clearer reconstruction of the LAD branches compared to Diffusion-MBIR.

120-view90-view
MethodPSNR \uparrowSSIM \uparrowHC \uparrowLBP-TS \downarrowPSNR \uparrowSSIM \uparrowHC \uparrowLBP-TS \downarrow
PDHG-TV[35]27.8560.8600.9110.19124.0670.7650.4980.201
FBPConvNet[39]30.5520.9120.9570.02327.820.8700.8210.008
TomoGAN[40]24.6980.8540.7810.00922.8610.8180.8130.007
Diffusion-MBIR[26]32.9910.9020.7970.01127.010.8270.7430.01
PSDM (ours)35.560.9380.9420.000830.600.9010.9120.0015

We conduct further tests using a 90-degree limited-angle dataset. The results, along with the magnified ROI images, are presented in Fig. 4. We observe that the PDHG-TV method exhibits noticeable limited-angle artifacts. Both FBPConvNet and TomoGAN demonstrate varying degrees of deformation at the diagonal edges. The diffusion-based methods appear to be more stable, without significant evident structural deformation. Similar to the 120 degree limited-angle reconstruction results, our PSDM method outperforms the Diffusion-MBIR in reconstructing finer structural details in LAD area, as highlighted by the red circles.

4.1.3 Quantitative Evaluation

A quantitative assessment is performed for different competing reconstruction techniques. In addition to the commonly used SSIM and PSNR, we also employ histogram correlation (HC) and local-binary-pattern-based texture similarity (LBP-TS) as evaluation metrics. HC measures the similarity between the histograms of reconstruction results and labels. A high correlation indicates that the images have similar intensity distributions, which is crucial for tasks like CT reconstruction where maintaining intensity distribution is desirable. LBP-TS evaluates the texture similarity between two images by extracting local binary patterns from them and comparing the patterns[41]; smaller values indicate higher textural image similarity.

The images that are reconstructed in two different scenarios (120°and 90°) are analyzed, and the quantitative metrics are presented in Table 1. Consistent with visual impressions, PSDM yields more favorable quantitative outcomes compared to the PDHG-TV, FBPConvNet, TomoGAN, and Diffusion-MBIR methods. In particular, the PSDM obtains the highest value of PSNR, SSIM, and HC in all cases. In terms of LBP-TS, the PSDM consistently achieves the smallest (most desirable) values. The empirical findings from this quantitative analysis provide a substantive validation of the merits of the PSDM method, underline its superior performance for limited-angle image reconstruction under the current test conditions.

4.2 Experiments on clinical data

For cardiac CT imaging, it is crucial to freeze the beating heart to avoid motion artifacts. However, the conventional ECG-gating does not work well for patients with tachycardia and arrhythmias, and radiation exposure is relatively high due to the overlapped continuous scanning needed for retrospective gating. By using limited-angle reconstruction techniques, temporal resolution can be significantly improved, allowing for clearer visualization of fast-moving structures such as heart valves and myocardial walls.

4.2.1 Stanford AIMI COCA results

To advance our clinical dataset evaluation, the Stanford AIMI COCA dataset[42] is employed. The COCA dataset is a compilation of 789 gated CT scans, with each scan corresponding to an individual patient. On average, every patient contributes roughly 50 image slices, and each image slice possesses a resolution of 512 x 512 pixels. The NCSNPP model is trained with data from 60 patients, totaling approximately 3,400 image slices. Apart from the image size, all other parameters remain consistent with the simulation training. The reconstruction results from 120 views appear in Fig. 5. Significantly, our proposed method produces images of enhanced quality characterized by prominent features and sharp image boundaries. This is especially discernible in the areas and structures underscored by red circles and arrows. In contrast, other algorithms tend to obscure certain details and introduce blurring.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (5)

4.2.2 Clinical cardiac results

Generalization is a major issue for DLR-based methods, especially for models requiring supervised learning. If a model is trained strictly on data with a specific angle range (e.g., 0–120°), it will not generalize well to other angular ranges. However, different clinical situations might require different angular ranges. For instance, certain anatomical regions or pathologies are better visualized with a specific angle range. To further demonstrate the advantages of PSDM, the algorithm is applied to a real clinical cardiac dataset. With the approval of the Institutional Review Board of the University of Massachusetts, Lowell, a deidentified clinical cardiac CT dataset is obtained from a GE HD 750 scanner. The patient was scanned using axial mode. 1,520 projections are acquired over an angular range of 556°(\approx 1.54 rotations). Each projection row has 835 elements at 1.095 mm pitch. The source to the rotation center distance is 538.5 mm, and the source to the detector distance is 946.7 mm. Reconstruction is performed by FBP using equi-angular geometry, where the image size is set to 512×512512512512\times 512512 × 512 to accelerate the sampling process. Since the supervised learning-trained models (FBPConvNet and TomoGAN) usually can only process specific angle range, we compare only the two unsupervised learning methods: Diffusion-MBIR and PSDM. During the sampling stage, these two methods directly utilize the checkpoints trained on the aforementioned Stanford AIMI COCA datasets. Fig. 6 shows the 120-view limited-angle reconstruction images from different angular ranges. We find that both Diffusion-MBIR and PSDM produce consistent and apparently acceptable results. However, compared with the Diffusion-MBIR, the results produced by PSDM exhibit fewer limited-angle artifacts and appear to provide a more realistic representation. These differences can be seen in the areas and structures underscored by red circles and arrows. Specifically, in the atrial region denoted by the red arrow, PSDM produces more stable results, while Diffusion-MBIR still exhibits limited-angle artifacts in the atria.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (6)

4.3 Ablation study

In the ablation study section, we explore the impact of various components and parameters in our proposed method, mainly focuses on three aspects: the number of reverse diffusion steps, the effect of the TV term, and how Fourier fusion module affect the denoising process. This examination is crucial to understand the contribution of each element to the overall performance of the system.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (7)
Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (8)

Figure. 7 presents a comparative analysis of the reconstruction results with and without the TV term. These quantitative metrics (SSIM and the PSNR) indicate that the TV term can improve the perceptual quality and the fidelity of the reconstructed images. Visually, results with the TV term exhibit less noise and more consistent structural integrity compared to those without the TV prior. Figure. 8 illustrates the performance comparison between PSDM and Diffusion-MBIR with respect to numbers of reverse diffusion steps. It is observed that PSDM demonstrates faster convergence relative to Diffusion-MBIR. Specifically, in the iterations ranging from 600 to 800, Diffusion-MBIR exhibits higher noise levels compared to PSDM. Figure 9 provides a quantitative comparison of the performance with and without the Fourier fusion module. It is observed that between 400-800 reverse steps, the PSDM (presumably a metric being compared) with the Fourier fusion module outperforms the one without it. Ultimately, both the PSNR and SSIM metrics show a slight improvement when the Fourier fusion module is utilized, highlighting its benefits.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (9)

5 Discussion & Conclusion

In the realm of language models, we have witnessed the rise of models such as ChatGPT, which have achieved impressive generalizability. Similarly, the next leap in imaging might be a generalized image model, versatile across tasks and less reliant on specific datasets. While current computational imaging methods merge traditional and modern data-driven techniques, they often falter in unfamiliar conditions due to dataset dependencies. This highlights the need for diverse datasets that are hard to procure in areas such as cardiac CT. The score-based diffusion model offers a solution. It can generate new samples from a learned data distribution and without requiring a network architecture specific to the problem at hand. Usually a standard architecture such as U-Net can be employed as denoiser, and adversarial learning is not required. Thus, diffusion models fill an essential gap in the landscape of generative models, offering an efficient and uncomplicated approach to model complex data distributions. Such streamlined design, combined with a straightforward training process, offers a powerful alternative to models that require either intricate design, or pose complex adversarial training challenges. As an unsupervised deep learning model, it is adaptable and less dataset-dependent. By incorporating physics-based methods, it emphasizes understanding physical processes rather than data reliance, broadening its conditions range and enhancing generalizability.

The combination of PDHG and SDE for CT reconstruction in this area suggests that PDHG could potentially be replaced with another model-based reconstruction algorithm. However, PDHG known for its effectiveness in solving non-smooth optimization problems, particularly in image processing, might have a specific synergy with the SDE-based approach. For instance, the primal-dual nature of PDHG could be especiallyadept at handling the noise model or the data fidelity term as formulated in the SDE approach. Secondly, PDHG is known for its strong convergence properties, especially in the context of convex optimization problems with non-smooth terms. If SDE-based denoising leads to an optimization landscape that benefits from these convergence properties, this would be a significant justification for using PDHG over other algorithms.

To prove the advantage of PDHG, a set of experiments are conducted to evaluate the importance of the PDHG in PSDM. The major difference between PSDM and Diffusion-MBIR is the data consistency component. For the PSDM, we employ the PDHG algorithm, which has demonstrated superior effectiveness and robustness compared to the ADMM algorithm used in Diffusion-MBIR. It is well known that the number of iterations is an important parameter in optimization algorithms. Typically, increasing the number of iterations improves the final results, but this comes at the cost of a significant rise in computational time. Fig. 10 presents a comparison between PSDM and Diffusion-MBIR as a function of the number of iterations, sampling one image. It is evident that as the iteration number increases, the computational cost of Diffusion-MBIR increases dramatically. In contrast, the sampling time for the proposed PSDM remains relatively stable and acceptable. Furthermore, insights from Fig. 11 reveal that, initially, the Diffusion-MBIR achieves superior PSNR and SSIM values compared to the PSDM. However, as the iteration number grows, PSDM witnesses a significant improvement in both PSNR and SSIM. On the other hand, the Diffusion-MBIR appears to plateau, suggesting a performance ceiling.

Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (10)
Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (11)

However, while PSDM exhibits a significantly reduced sampling time compared to the Diffusion-MBIR method, it falls short in applications requiring real-time deartifacting. This is a crucial aspect that needs to be addressed, because accelerating the sampling procedure could greatly enhance the model’s applicability in time-sensitive tasks, such as in a clinical setting where quick decision-making is essential. While our proposed approach is empirically driven, we continue to investigate the supporting theory. Although PSDM shows good performance in handling a variety of datasets, its adaptability has limits and there is room to further improve image quality. This indicates a need for further refinement in its ability to generalize across more diverse data types. Theoretically speaking, it is not ideal to apply projection steps to the noisy variable in both Diffusion-MBIR and PSDM because the noisy variable does not belong to the subspace of the system matrix A due to the added noise. In the future, we plan to adopt a more accurate probabilistic formula such as DPS [43]. DPS estimates mean of x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT from xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and apply the data fidelity to the estimated x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. However, it introduces a Jacobian term, and a hard consistency method is introduced in some papers to avoid this[44]. We will consider incorporating this approach in the future.

In conclusion, while data-driven methods have already revolutionized the field of imaging and computer vision, the limitations of these methods in terms of generalizability and data dependency cannot be overlooked. The physics-informed score-based diffusion model, by synergizing data- and model-based methods, offers a promising direction, emphasizing the importance of understanding the underlying physics. This approach not only mitigates the challenges posed by data dependency but also broadens the horizons for tackling diverse conditions in cardiac CT reconstruction and without a significant increase in computational cost.

References

  • [1]K.Korsholm, S.Berti, X.Iriart, J.Saw, D.D. Wang, H.Cochet, D.Chow,A.Clemente, O.DeBacker, J.MøllerJensen etal., “Expertrecommendations on cardiac computed tomography for planning transcatheterleft atrial appendage occlusion,” Cardiovascular Interventions,vol.13, no.3, pp. 277–292, 2020.
  • [2]D.Bharkhada, H.Yu, and G.Wang, “Knowledge-based dynamic volumetric cardiaccomputed tomography with saddle curve trajectory,” Journal of computerassisted tomography, vol.32, no.6, p. 942, 2008.
  • [3]G.Wang, S.Zhao, and D.Heuscher, “A knowledge-based cone-beam x-ray CTalgorithm for dynamic volumetric cardiac imaging,” Medical physics,vol.29, no.8, pp. 1807–1822, 2002.
  • [4]G.A. Roth, G.A. Mensah, C.O. Johnson, G.Addolorato, E.Ammirati, L.M.Baddour, N.C. Barengo, A.Z. Beaton, E.J. Benjamin, C.P. Benzigeretal., “Global burden of cardiovascular diseases and risk factors,1990–2019: update from the gbd 2019 study,” Journal of the AmericanCollege of Cardiology, vol.76, no.25, pp. 2982–3021, 2020.
  • [5]D.J. Brenner and E.J. Hall, “Computed tomography–an increasing source ofradiation exposure,” New England journal of medicine, vol. 357,no.22, pp. 2277–2284, 2007.
  • [6]L.Li, K.Kang, Z.Chen, L.Zhang, Y.Xing, H.Yu, and G.Wang, “Analternative derivation and description of smith’s data sufficiency conditionfor exact cone-beam reconstruction,” Journal of X-Ray Science andTechnology, vol.16, no.1, pp. 43–49, 2008.
  • [7]E.Y. Sidky and X.Pan, “Image reconstruction in circular cone-beam computedtomography by constrained, total-variation minimization,” Physics inMedicine & Biology, vol.53, no.17, p. 4777, 2008.
  • [8]S.Niu, Y.Gao, Z.Bian, J.Huang, W.Chen, G.Yu, Z.Liang, and J.Ma,“Sparse-view x-ray CT reconstruction via total generalized variationregularization,” Physics in Medicine & Biology, vol.59, no.12, p.2997, 2014.
  • [9]Z.Zhang, B.Chen, D.Xia, E.Y. Sidky, and X.Pan, “Directional-TValgorithm for image reconstruction from limited-angular-range data,”Medical Image Analysis, vol.70, p. 102030, 2021.
  • [10]G.Wang, J.C. Ye, and B.DeMan, “Deep learning for tomographic imagereconstruction,” Nature Machine Intelligence, vol.2, no.12, pp.737–748, 2020.
  • [11]B.Morovati, R.Lashgari, M.Hajihasani, and H.Shabani, “Reduced DeepConvolutional Activation Features (R-DeCAF) in HistopathologyImages to Improve the Classification Performance for BreastCancer Diagnosis,” Journal of Digital Imaging, Aug. 2023.
  • [12]S.Han, Y.Zhao, F.Li, D.Ji, Y.Li, M.Zheng, W.Lv, X.Xin, X.Zhao, B.Qietal., “Dual-path deep learning reconstruction framework forpropagation-based x-ray phase–contrast computed tomography with sparse-viewprojections,” Optics Letters, vol.46, no.15, pp. 3552–3555, 2021.
  • [13]F.Li, Y.Zhao, S.Han, D.Ji, Y.Li, M.Zheng, W.Lv, J.Jian, X.Zhao, andC.Hu, “Physics-informed deep neural network reconstruction framework forpropagation-based x ray phase-contrast computed tomography with sparse-viewprojections,” Optics Letters, vol.47, no.16, pp. 4259–4262, 2022.
  • [14]D.P. Kingma, M.Welling etal., “An introduction to variationalautoencoders,” Foundations and Trends® in MachineLearning, vol.12, no.4, pp. 307–392, 2019.
  • [15]I.Goodfellow, J.Pouget-Abadie, M.Mirza, B.Xu, D.Warde-Farley, S.Ozair,A.Courville, and Y.Bengio, “Generative adversarial nets,” Advancesin neural information processing systems, vol.27, 2014.
  • [16]M.Arjovsky and L.Bottou, “Towards principled methods for training generativeadversarial networks,” in International Conference on LearningRepresentations, 2016.
  • [17]Q.Wu, R.Gao, and H.Zha, “Bridging explicit and implicit deep generativemodels via neural stein estimators,” in Advances in Neural InformationProcessing Systems, 2021.
  • [18]J.Sohl-Dickstein, E.Weiss, N.Maheswaranathan, and S.Ganguli, “Deepunsupervised learning using nonequilibrium thermodynamics,” inInternational conference on machine learning.PMLR, 2015, pp. 2256–2265.
  • [19]J.Ho, A.Jain, and P.Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol.33, pp.6840–6851, 2020.
  • [20]Y.Song and S.Ermon, “Generative modeling by estimating gradients of the datadistribution,” Advances in neural information processing systems,vol.32, 2019.
  • [21]P.Dhariwal and A.Nichol, “Diffusion models beat GANs on image synthesis,”Advances in neural information processing systems, vol.34, pp.8780–8794, 2021.
  • [22]C.Saharia, J.Ho, W.Chan, T.Salimans, D.J. Fleet, and M.Norouzi, “Imagesuper-resolution via iterative refinement,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol.45, no.4, pp. 4713–4726,2022.
  • [23]B.Kawar, M.Elad, S.Ermon, and J.Song, “Denoising diffusion restorationmodels,” Advances in Neural Information Processing Systems, vol.35,pp. 23 593–23 606, 2022.
  • [24]H.Chung, B.Sim, and J.C. Ye, “Come-closer-diffuse-faster: Acceleratingconditional diffusion models for inverse problems through stochasticcontraction,” in Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 2022, pp. 12 413–12 422.
  • [25]Y.Song, L.Shen, L.Xing, and S.Ermon, “Solving inverse problems in medicalimaging with score-based generative models,” in InternationalConference on Learning Representations, 2021.
  • [26]H.Chung, D.Ryu, M.T. McCann, M.L. Klasky, and J.C. Ye, “Solving 3dinverse problems using pre-trained 2d diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition, 2023, pp. 22 542–22 551.
  • [27]N.Komodakis and J.-C. Pesquet, “Playing with duality: An overview of recentprimal? dual approaches for solving large-scale optimization problems,”IEEE Signal Processing Magazine, vol.32, no.6, pp. 31–54, 2015.
  • [28]Y.Song, J.Sohl-Dickstein, D.P. Kingma, A.Kumar, S.Ermon, and B.Poole,“Score-based generative modeling through stochastic differentialequations,” in International Conference on Learning Representations,2021.
  • [29]B.D. Anderson, “Reverse-time diffusion equation models,” StochasticProcesses and their Applications, vol.12, no.3, pp. 313–326, 1982.
  • [30]J.Ho and T.Salimans, “Classifier-free diffusion guidance,” in NeurIPS2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  • [31]H.Chung and J.C. Ye, “Score-based diffusion models for accelerated mri,”Medical image analysis, vol.80, p. 102479, 2022.
  • [32]P.E. Kloeden and E.Platen, Numerical Solution of StochasticDifferential Equations.SpringerScience & Business Media, 2013, vol.23.
  • [33]W.Wu, D.Hu, W.Cong, H.Shan, S.Wang, C.Niu, P.Yan, H.Yu,V.Vardhanabhuti, and G.Wang, “Stabilizing deep tomographic reconstruction:Part A. Hybrid framework and experimental results,” Patterns,vol.3, no.5, 2022.
  • [34]A.Chambolle and T.Pock, “A first-order primal-dual algorithm for convexproblems with applications to imaging,” Journal of mathematicalimaging and vision, vol.40, pp. 120–145, 2011.
  • [35]E.Y. Sidky, J.H. Jørgensen, and X.Pan, “Convex optimization problemprototyping for image reconstruction in computed tomography with thechambolle–pock algorithm,” Physics in Medicine & Biology, vol.57,no.10, p. 3065, 2012.
  • [36]W.P. Segars, G.Sturgeon, S.Mendonca, J.Grimes, and B.M. Tsui, “4DXCAT phantom for multimodality imaging research,” Medical physics,vol.37, no.9, pp. 4902–4915, 2010.
  • [37]Y.Xu, A.Sushmit, Q.Lyu, Y.Li, X.Cao, J.S. Maltz, G.Wang, and H.Yu,“Cardiac CT motion artifact grading via semi-automatic labeling and vesseltracking using synthetic image-augmented training data,” Journal ofX-Ray Science and Technology, vol.30, no.3, pp. 433–445, 2022.
  • [38]J.Adler, H.Kohr, and O.Öktem, “Odl-a python framework for rapidprototyping in inverse problems,” Royal Institute of Technology,2017.
  • [39]K.H. Jin, M.T. McCann, E.Froustey, and M.Unser, “Deep convolutional neuralnetwork for inverse problems in imaging,” IEEE transactions on imageprocessing, vol.26, no.9, pp. 4509–4522, 2017.
  • [40]Z.Liu, T.Bicer, R.Kettimuthu, D.Gursoy, F.DeCarlo, and I.Foster,“TomoGAN: low-dose synchrotron x-ray tomography with generativeadversarial networks: discussion,” JOSA A, vol.37, no.3, pp.422–434, 2020.
  • [41]T.Ojala, M.Pietikainen, and T.Maenpaa, “Multiresolution gray-scale androtation invariant texture classification with local binary patterns,”IEEE Transactions on pattern analysis and machine intelligence,vol.24, no.7, pp. 971–987, 2002.
  • [42]S.U.S. AIMI, “Coca - coronary calcium and chest cts,”https://stanfordaimi.azurewebsites.net/datasets/e8ca74dc-8dd4-4340-815a-60b41f6cb2aa,2021.
  • [43]H.Chung, J.Kim, M.T. Mccann, M.L. Klasky, and J.C. Ye, “Diffusionposterior sampling for general noisy inverse problems,” in TheEleventh International Conference on Learning Representations, 2022.
  • [44]B.Song, S.M. Kwon, Z.Zhang, X.Hu, Q.Qu, and L.Shen, “Solving inverseproblems with latent diffusion models via hard data consistency,”arXiv preprint arXiv:2307.08123, 2023.
Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography (2024)
Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 5387

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.