<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Multimodal Learning | Chenglu Zhu</title><link>https://hzzcl.github.io/resume.io/tags/multimodal-learning/</link><atom:link href="https://hzzcl.github.io/resume.io/tags/multimodal-learning/index.xml" rel="self" type="application/rss+xml"/><description>Multimodal Learning</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 21 May 2024 00:00:00 +0000</lastBuildDate><image><url>https://hzzcl.github.io/resume.io/media/icon_hu1c04a90d961651ebaa864f5d44daa878_19395_512x512_fill_lanczos_center_3.png</url><title>Multimodal Learning</title><link>https://hzzcl.github.io/resume.io/tags/multimodal-learning/</link></image><item><title>Reconstructing Computational Paradigms for Pathological Image Analysis</title><link>https://hzzcl.github.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/</link><pubDate>Tue, 21 May 2024 00:00:00 +0000</pubDate><guid>https://hzzcl.github.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/</guid><description>&lt;p>The gigapixel scale of Whole Slide Images (WSI), the chronic absence of clinical multimodal data, and the &amp;ldquo;compute wall&amp;rdquo; for fine-tuning large models constitute the &amp;ldquo;Three Major Hurdles&amp;rdquo; restricting the development of high-precision pathological AI.&lt;/p>
&lt;p>This project reconstructs the computational paradigm of pathological image analysis across three dimensions: &lt;strong>low-rank architectural breakthroughs&lt;/strong>, &lt;strong>robust fusion mechanisms for missing modalities&lt;/strong>, and &lt;strong>task-specific efficient fine-tuning&lt;/strong>.&lt;/p>
&lt;h2 id="1-breaking-the-low-rank-bottleneck-in-long-sequences-longmil">1. Breaking the &amp;ldquo;Low-Rank&amp;rdquo; Bottleneck in Long Sequences (LongMIL)&lt;/h2>
&lt;p>&lt;strong>Original Paper:&lt;/strong> &lt;em>Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis&lt;/em> (NeurIPS 2024)
&lt;!-- raw HTML omitted -->&lt;strong>Authors:&lt;/strong> Honglin Li, Yunlong Zhang, Pingyi Chen, Zhongyi Shui, Chenglu Zhu, Lin Yang&lt;/p>
&lt;h3 id="the-scientific-question-the-transformers-achilles-heel-in-wsi">The Scientific Question: The Transformer&amp;rsquo;s &amp;ldquo;Achilles&amp;rsquo; Heel&amp;rdquo; in WSI&lt;/h3>
&lt;p>When processing WSIs containing tens of thousands of patches, traditional Transformers face two critical challenges:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Explosive Complexity:&lt;/strong> The $O(N^2)$ complexity of standard Self-Attention makes memory usage unsustainable.&lt;/li>
&lt;li>&lt;strong>Low-Rank Bottleneck:&lt;/strong> We theoretically revealed that when sequence length $N$ far exceeds embedding dimension $D$, the attention matrix exhibits mathematical &amp;ldquo;low-rank&amp;rdquo; properties. This means attention maps become homogenized, failing to capture fine-grained local microenvironmental differences.&lt;/li>
&lt;/ol>
&lt;h3 id="core-method-local-global-hybrid-attention">Core Method: Local-Global Hybrid Attention&lt;/h3>
&lt;p>To break the rank limit and reduce computation, we propose the &lt;strong>LongMIL&lt;/strong> architecture:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Local Attention Mask:&lt;/strong> By introducing local window constraints, we force the model to focus on interactions within local neighborhoods. Theory proves this sparsification significantly increases the &lt;strong>Rank&lt;/strong> of the attention matrix.&lt;/li>
&lt;li>&lt;strong>Linear Complexity:&lt;/strong> Utilizing a Chunked Computation strategy reduces complexity from quadratic $O(N^2)$ to linear $O(N \times w)$ (where $w$ is window size).&lt;/li>
&lt;li>&lt;strong>Dual-Stream Architecture:&lt;/strong> A &amp;ldquo;Local-First, Global-Second&amp;rdquo; design captures cell community features before aggregating slide-level information.&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure id="figure-figure-3-the-longmil-framework-stage-1-prepares-features-stage-2-uses-local-masks-for-accelerated-attention-overall-stage-models-the-hierarchy-from-local-to-global">
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="LongMIL Architecture" srcset="
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig1_hu711c45aa60dabc8ce01810a864ea0a33_66080_47d4ff0e2529758e5b8e45df0157b493.webp 400w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig1_hu711c45aa60dabc8ce01810a864ea0a33_66080_99a9d307123b980f35063cc476536b62.webp 760w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig1_hu711c45aa60dabc8ce01810a864ea0a33_66080_1200x1200_fit_q80_h2_lanczos_2.webp 1200w"
src="https://hzzcl.github.io/resume.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig1_hu711c45aa60dabc8ce01810a864ea0a33_66080_47d4ff0e2529758e5b8e45df0157b493.webp"
width="760"
height="508"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: The LongMIL framework: Stage-1 prepares features; Stage-2 uses Local Masks for accelerated attention; Overall-stage models the hierarchy from local to global.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="results">Results&lt;/h3>
&lt;p>&lt;strong>Table1&lt;/strong>&lt;/p>
&lt;div style="overflow-x: auto; display: block; width: 100%;">
Method,ViT-S Lunit [36] F1,ViT-S Lunit [36] AUC,ViT-S DINO (our pre-train) F1,ViT-S DINO (our pre-train) AUC
KNN (Mean),0.503 ± 0.011,0.691 ± 0.007,0.430 ± 0.029,0.649 ± 0.008
KNN (Max),0.472 ± 0.009,0.771 ± 0.018,0.416 ± 0.019,0.645 ± 0.007
Mean-pooling,0.534 ± 0.026,0.741 ± 0.017,0.487 ± 0.034,0.717 ± 0.020
Max-pooling,0.649 ± 0.032,0.843 ± 0.018,0.598 ± 0.032,0.818 ± 0.006
AB-MIL [32],0.668 ± 0.032,0.866 ± 0.016,0.621 ± 0.048,0.837 ± 0.035
DS-MIL [40],0.607 ± 0.044,0.824 ± 0.028,0.622 ± 0.063,0.808 ± 0.033
CLAM-SB [50],0.647 ± 0.020,0.836 ± 0.021,0.627 ± 0.032,0.836 ± 0.009
DTFD-MIL MaxS [89],0.597 ± 0.025,0.874 ± 0.026,0.521 ± 0.059,0.807 ± 0.016
DTFD-MIL AFS [89],0.608 ± 0.083,0.869 ± 0.018,0.538 ± 0.053,0.824 ± 0.011
TransMIL [65],0.648 ± 0.054,0.835 ± 0.031,0.591 ± 0.049,0.798 ± 0.029
Full Attention,0.689 ± 0.036,0.870 ± 0.010,0.648 ± 0.028,0.839 ± 0.018
LongMIL (ours),0.706 ± 0.025,0.888 ± 0.019,0.657 ± 0.026,0.848 ± 0.004
&lt;/div>
&lt;div style="text-align: center; font-size: 0.6em; color: #555; margin-top: 5px;">
Table 1: Slide-Level Survival Prediction based on HIPT [9] pre-trained embedding with variousWSI-MIL architectures including vanilla attention, GCN, TransllL, self-attention (HIPT with regionslicing and absolute embedding), full self-attention and our LongMl.
&lt;/div>
&lt;p>&lt;strong>Table 2&lt;/strong>
&lt;div style="overflow-x: auto; display: block; width: 100%;">
Method,COADREAD,STAD,BRCA
AB-MIL [32],0.566 ± 0.075,0.562 ± 0.049,0.549 ± 0.057
AMISL [86],0.561 ± 0.088,0.563 ± 0.067,0.545 ± 0.071
DS-MIL [40],0.470 ± 0.053,0.546 ± 0.047,0.548 ± 0.058
GCN-MIL [43],0.538 ± 0.049,0.513 ± 0.069,-
HIPT [9],&lt;!-- raw HTML omitted -->0.608 ± 0.088&lt;!-- raw HTML omitted -->,0.570 ± 0.081,-
TransMIL [65],0.597 ± 0.134,0.564 ± 0.080,0.587 ± 0.063
Full Attention,0.603 ± 0.048,&lt;!-- raw HTML omitted -->0.568 ± 0.074&lt;!-- raw HTML omitted -->,&lt;!-- raw HTML omitted -->0.601 ± 0.047&lt;!-- raw HTML omitted -->
LongMIL (ours),0.624 ± 0.057,0.589 ± 0.066,0.619 ± 0.053
&lt;/div>&lt;/p>
&lt;div style="text-align: center; font-size: 0.6em; color: #555; margin-top: 5px;">
Table 2: Slide-Level Tumor Subtyping on BRACS by using two pre-trained embeddings. Top Rows.Various WSI-MI architectures with vanilla attention (no interaction among different instances)Bottom Rows. TransMlL, (using Nyströmformer and learnable absolute position embedding), fullattention (+RoPE) and our LongMIL.
&lt;/div>
&lt;p>On &lt;strong>BRACS&lt;/strong> and &lt;strong>TCGA-BRCA&lt;/strong> datasets:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Performance:&lt;/strong> F1-score reached &lt;strong>0.657&lt;/strong> on BRACS tumor typing, significantly outperforming SOTA methods like TransMIL.&lt;/li>
&lt;li>&lt;strong>Extrapolation:&lt;/strong> In &amp;ldquo;train small, test large&amp;rdquo; experiments, LongMIL showed strong robustness (p-value $\approx$ 0.1), proving its adaptability to varying WSI sizes.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2-feature-mining-under-weak-supervision-attention-challenging-mil-acmil">2. Feature Mining under Weak Supervision: Attention-Challenging MIL (ACMIL)&lt;/h2>
&lt;p>&lt;strong>Original Paper:&lt;/strong> &lt;em>Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification&lt;/em> (ECCV 2024)
&lt;!-- raw HTML omitted -->&lt;strong>Authors:&lt;/strong> Yunlong Zhang, Honglin Li, Yunxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang&lt;/p>
&lt;h3 id="the-scientific-question-attention-laziness">The Scientific Question: Attention &amp;ldquo;Laziness&amp;rdquo;&lt;/h3>
&lt;p>In Weakly Supervised Multiple Instance Learning (MIL), models tend to focus only on the most obvious discriminative regions (e.g., tumor cores), ignoring edges or atypical key features. This &amp;ldquo;Attention Laziness&amp;rdquo; leads to poor generalization on heterogeneous tumors.&lt;/p>
&lt;p>
&lt;figure id="figure-figure-3-motivation-of-mba-left--figure-4-motivation-of-stkim">
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="ACMIL Comparison" srcset="
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig2_hu3e5b7b25a6d59df4392e632fa46e4b39_56758_8f8a9c68d8e0fd7523c33eaa47f89137.webp 400w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig2_hu3e5b7b25a6d59df4392e632fa46e4b39_56758_96e810c97dde816de711fbff4bfac396.webp 760w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig2_hu3e5b7b25a6d59df4392e632fa46e4b39_56758_1200x1200_fit_q80_h2_lanczos_2.webp 1200w"
src="https://hzzcl.github.io/resume.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig2_hu3e5b7b25a6d59df4392e632fa46e4b39_56758_8f8a9c68d8e0fd7523c33eaa47f89137.webp"
width="760"
height="437"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: Motivation of MBA (left) &amp;amp; Figure 4: Motivation of STKIM.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="core-method-adversarial-attention-enhancement">Core Method: Adversarial Attention Enhancement&lt;/h3>
&lt;p>We propose the &lt;strong>ACMIL&lt;/strong> framework to &amp;ldquo;manufacture difficulty&amp;rdquo; for the model:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Multi-Branch Attention (MBA):&lt;/strong> Parallel attention branches capture distinct clustering patterns in the feature space (verified via UMAP), covering more diverse pathological features.&lt;/li>
&lt;li>&lt;strong>Stochastic Top-K Instance Masking (STKIM):&lt;/strong> During training, we randomly &amp;ldquo;mask&amp;rdquo; the Top-K instances with the highest attention scores.&lt;/li>
&lt;/ul>
&lt;!-- raw HTML omitted -->
&lt;p>
&lt;figure id="figure-figure-6-heatmap-comparison-showing-acmil-right-covering-broader-tumor-regions-than-the-baseline-left">
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="ACMIL Comparison" srcset="
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig3_hu3e307508bcfe60c6dd042fed8e982faf_76954_450ec0f6de286e3cf69dec583944ad5e.webp 400w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig3_hu3e307508bcfe60c6dd042fed8e982faf_76954_3e8a1d3b823477a55556f93c8a427298.webp 760w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig3_hu3e307508bcfe60c6dd042fed8e982faf_76954_1200x1200_fit_q80_h2_lanczos_2.webp 1200w"
src="https://hzzcl.github.io/resume.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig3_hu3e307508bcfe60c6dd042fed8e982faf_76954_450ec0f6de286e3cf69dec583944ad5e.webp"
width="760"
height="563"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 6: Heatmap comparison showing ACMIL (right) covering broader tumor regions than the baseline (left).
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="results-1">Results&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Camelyon16:&lt;/strong> Achieved an AUC of &lt;strong>0.954&lt;/strong>, outperforming methods like DTFD-MIL.&lt;/li>
&lt;li>&lt;strong>TCGA-LBC:&lt;/strong> AUC increased to &lt;strong>0.901&lt;/strong> on liquid-based cytology data, proving effectiveness in sparse feature mining.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="3-addressing-missing-clinical-data-bidirectional-distillation">3. Addressing Missing Clinical Data: Bidirectional Distillation&lt;/h2>
&lt;p>&lt;strong>Original Paper:&lt;/strong> &lt;em>Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis&lt;/em> (BIBM 2023)
&lt;!-- raw HTML omitted -->&lt;strong>Authors:&lt;/strong> Shichuan Zhang, Sunyi Zheng, Zhongyi Shui, Honglin Li, Lin Yang&lt;/p>
&lt;h3 id="the-scientific-question-the-multimodal-bucket-effect">The Scientific Question: The Multimodal &amp;ldquo;Bucket Effect&amp;rdquo;&lt;/h3>
&lt;p>In clinical practice, WSI and tabular data (genomics, clinical markers) are often asynchronous. Existing multimodal models often suffer a severe performance drop—sometimes below single-modal baselines—when clinical data is missing.&lt;/p>
&lt;h3 id="core-method-bidirectional-distillation--learnable-prompts">Core Method: Bidirectional Distillation &amp;amp; Learnable Prompts&lt;/h3>
&lt;p>We propose a &lt;strong>Bidirectional Distillation (BD)&lt;/strong> framework to teach the model how to handle missingness:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Decoupling:&lt;/strong> Parallel &amp;ldquo;Single-Modal Branch&amp;rdquo; (WSI only) and &amp;ldquo;Multi-Modal Branch&amp;rdquo; (WSI + Clinical).&lt;/li>
&lt;li>&lt;strong>Learnable Prompt:&lt;/strong> A learnable vector acts as a placeholder for missing modalities in the single-modal branch.&lt;/li>
&lt;li>&lt;strong>Bidirectional Distillation:&lt;/strong> We distill fused knowledge from Multi $\to$ Single ($\mathcal{M} \to \mathcal{S}$) and distill pure image features back from Single $\to$ Multi ($\mathcal{S} \to \mathcal{M}$) to prevent noise interference.&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure id="figure-figure-2-the-bd-structure-showing-parallel-branches-the-learnable-prompt-and-the-bidirectional-distillation-loss-paths">
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="BD Framework" srcset="
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig4_hu78c89133f1d28c40cec6aab15382bb0b_32450_08bd870a8b0d966025e8bd134ac32c74.webp 400w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig4_hu78c89133f1d28c40cec6aab15382bb0b_32450_46d9ad27adeddddf8180b35634840629.webp 760w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig4_hu78c89133f1d28c40cec6aab15382bb0b_32450_1200x1200_fit_q80_h2_lanczos_2.webp 1200w"
src="https://hzzcl.github.io/resume.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig4_hu78c89133f1d28c40cec6aab15382bb0b_32450_08bd870a8b0d966025e8bd134ac32c74.webp"
width="760"
height="390"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 2: The BD structure showing parallel branches, the Learnable Prompt, and the bidirectional distillation loss paths.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="results-2">Results&lt;/h3>
&lt;p>In BCNB Breast Cancer Lymph Node Metastasis prediction:&lt;/p>
&lt;ul>
&lt;li>&lt;!-- raw HTML omitted -->Resilience:&lt;!-- raw HTML omitted --> With &lt;strong>80%-100%&lt;/strong> clinical data missing, BD maintained an F1-score of &lt;strong>~74.9%&lt;/strong>, while direct filling methods crashed to below 68%.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="4-low-cost-wsi-adaptation-variational-information-bottleneck-fine-tuning">4. Low-Cost WSI Adaptation: Variational Information Bottleneck Fine-tuning&lt;/h2>
&lt;p>&lt;strong>Original Paper:&lt;/strong> &lt;em>Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification&lt;/em> (CVPR 2023)
&lt;!-- raw HTML omitted -->&lt;strong>Authors:&lt;/strong> Honglin Li, Chenglu Zhu, Yunlong Zhang, Yuxuan Sun, Zhongyi Shui, Wenwei Kuang, Sunyi Zheng, Lin Yang&lt;/p>
&lt;h3 id="the-scientific-question-the-wsi-compute-wall">The Scientific Question: The WSI &amp;ldquo;Compute Wall&amp;rdquo;&lt;/h3>
&lt;p>Pathology models typically use ImageNet pre-trained backbones, which suffer from a domain gap. However, end-to-end full fine-tuning on WSIs (thousands of patches) requires VRAM far beyond standard GPU capabilities.&lt;/p>
&lt;h3 id="core-method-sparse-critical-instance-selection">Core Method: Sparse Critical Instance Selection&lt;/h3>
&lt;p>Based on &lt;strong>Variational Information Bottleneck (VIB)&lt;/strong> theory, we screen for the &amp;ldquo;minimal sufficient statistics&amp;rdquo;:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>IB Module Screening:&lt;/strong> A lightweight module selects the Top-K diagnostic instances (usually &amp;lt;1000) based on mutual information maximization.&lt;/li>
&lt;li>&lt;strong>Sparse Backpropagation:&lt;/strong> Gradients are back-propagated &lt;strong>only&lt;/strong> through selected instances during fine-tuning, reducing computational overhead by &lt;strong>&amp;gt;10x&lt;/strong>.&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure id="figure-figure-3-the-three-stage-vib-process-learning-the-bottleneck---sparse-representation-fine-tuning---retraining-the-wsi-head">
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="VIB Fine-tuning" srcset="
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig5_hua46c3dbc047e93eecb871c2a8bfa86e4_32462_ad15a90ec3ba0a173528e4a1266a73db.webp 400w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig5_hua46c3dbc047e93eecb871c2a8bfa86e4_32462_d4cc262d81b0c43b9b27a21c0868aad8.webp 760w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig5_hua46c3dbc047e93eecb871c2a8bfa86e4_32462_1200x1200_fit_q80_h2_lanczos_2.webp 1200w"
src="https://hzzcl.github.io/resume.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig5_hua46c3dbc047e93eecb871c2a8bfa86e4_32462_ad15a90ec3ba0a173528e4a1266a73db.webp"
width="760"
height="332"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 3: The three-stage VIB process: Learning the Bottleneck -&amp;gt; Sparse representation fine-tuning -&amp;gt; Retraining the WSI head.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;h3 id="results-3">Results&lt;/h3>
&lt;div style="overflow-x: auto; display: block; width: 100%;">
Method,Camelyon-16 F1,Camelyon-16 AUC,TCGA-BRCA F1,TCGA-BRCA AUC,LBP-CECA F1,LBP-CECA AUC
Full Supervision,0.967±0.005,0.992±0.003,-,-,0.741±0.006,0.942±0.002
RNN-MIL [7],0.834±0.017,0.861±0.021,0.776±0.035,0.871±0.033,-,-
AB-MIL [19],0.828±0.013,0.851±0.025,0.771±0.040,0.869±0.037,0.525±0.017,0.845±0.002
DS-MIL [25],0.857±0.023,0.892±0.012,0.775±0.044,0.875±0.041,-,-
CLAM-SB [30],0.839±0.018,0.875±0.028,0.797±0.046,0.879±0.019,0.587±0.014,0.860±0.005
TransMIL [38],0.846±0.013,0.883±0.009,0.806±0.046,0.889±0.036,0.533±0.006,0.850±0.007
DTFD-MIL [45],0.882±0.008,0.932±0.016,0.816±0.045,0.895±0.042,0.569±0.026,0.847±0.003
FT+ CLAM-SB,0.911±0.017,0.956±0.013,0.845±0.032,0.935±0.027,0.718±0.010,0.907±0.005
FT+ TransMIL,0.923±0.012,0.967±0.003,0.848±0.044,0.945±0.020,0.720±0.024,0.918±0.004
FT+ DTFD-MIL,0.921±0.007,0.962±0.006,0.849±0.027,0.951±0.016,0.723±0.008,0.922±0.005
Mean-pooling,0.629±0.029,0.591±0.012,0.818±0.022,0.910±0.032,0.350±0.017,0.735±0.006
Max-pooling,0.805±0.012,0.824±0.016,0.644±0.179,0.826±0.096,0.636±0.064,0.893±0.019
KNN (Mean),0.468±0.000,0.506±0.000,0.633±0.066,0.749±0.055,0.393±0.000,0.650±0.000
KNN (Max),0.559±0.000,0.535±0.000,0.524±0.032,0.639±0.063,0.477±0.000,0.743±0.000
FT+ Mean-pooling,0.842±0.006,0.831±0.007,0.866±0.035,0.952±0.018,0.685±0.014,0.900±0.002
FT+ Max-pooling,0.927±0.011,0.969±0.004,0.852±0.043,0.948±0.019,0.695±0.013,0.912±0.004
FT+ KNN (Mean),0.505±0.000,0.526±0.000,0.784±0.044,0.907±0.034,0.529±0.000,0.737±0.000
FT+ KNN (Max),0.905±0.000,0.916±0.000,0.802±0.063,0.882±0.036,0.676±0.000,0.875±0.000
&lt;/div>
&lt;div style="text-align: center; font-size: 0.6em; color: #555; margin-top: 5px;">
Table 3. Slide-Level Classification by using the IN-lK pre-trained backbone or the proposed fine-tuned (FT) in three datasets. Top RowsDifierent Mll, architectures are compared to select the top 3 $OTA methods to validate the transfer learning performance using the IN-lKbre-trained backbone or the FT, Bottom Rows. The competition of various traditional aggrcgalion and feature evaluation methods by usingore-trained IN-lK or the FT.
&lt;/div>
&lt;p>
&lt;figure id="figure-figure-6-t-sne-visualization-of-different-representations-onpatches-our-method-converts-chaotic-imagenet-lk-and-ssl-fea-tures-into-a-more-task-specifc-and-separable-distribution-thecluster-evaluation-measurement-v-scores-show-weakly-super-vised-fine-tuned-features-are-more-close-to-full-supervision-com-pared-to-others-a-imagenet-1k-pretraining-b-full-patch-supervi-sioncself-supervised-learning-d-fine-tuning-with-wsi-labels">
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img alt="VIB T-SNE" srcset="
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig6_hue4f6be31abb9cb7384858f7bd2691b10_45724_02cef4cbdb0fbde3da10aae5adb9b403.webp 400w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig6_hue4f6be31abb9cb7384858f7bd2691b10_45724_d2eb8a2651cf986bb2ca3f6dac87e01c.webp 760w,
/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig6_hue4f6be31abb9cb7384858f7bd2691b10_45724_1200x1200_fit_q80_h2_lanczos_2.webp 1200w"
src="https://hzzcl.github.io/resume.io/resume.io/insights/reconstructing-computational-paradigms-for-pathological-image-analysis/fig6_hue4f6be31abb9cb7384858f7bd2691b10_45724_02cef4cbdb0fbde3da10aae5adb9b403.webp"
width="486"
height="540"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Figure 6. T-SNE visualization of different representations onpatches. Our method converts chaotic ImageNet-lK and SSL fea-tures into a more task-specifc and separable distribution. Thecluster evaluation measurement, v-scores, show weakly super-vised fine-tuned features are more close to full supervision com-pared to others. a. ImageNet-1k pretraining. b. Full patch supervi-sion.c.Self-supervised Learning. d. Fine-tuning with WSI labels.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Performance Leap:&lt;/strong> On Camelyon16, a VIB fine-tuned ResNet-50 with simple Max-pooling achieved an AUC of &lt;strong>0.969&lt;/strong>, a &lt;strong>14.5%&lt;/strong> jump over the ImageNet baseline (0.824).&lt;/li>
&lt;li>&lt;strong>Feature Space:&lt;/strong> t-SNE visualization confirms significantly improved inter-class separation.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;p>This research directly targets the &amp;ldquo;compute&amp;rdquo; and &amp;ldquo;data&amp;rdquo; bottlenecks in pathological AI deployment.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>LongMIL &amp;amp; ACMIL&lt;/strong> reconstruct WSI attention mechanisms.&lt;/li>
&lt;li>&lt;strong>BD Framework&lt;/strong> solves the pain point of missing clinical data.&lt;/li>
&lt;li>&lt;strong>VIB Fine-tuning&lt;/strong> breaks the compute barrier for large-scale model optimization.&lt;/li>
&lt;/ul>
&lt;p>Together, these provide the core algorithmic support for building high-precision, low-cost, and robust pathological AI systems.&lt;/p></description></item></channel></rss>