Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

showmaker update #18

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/static.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Simple workflow for deploying static content to GitHub Pages
name: Deploy static content to Pages

on:
# Runs on pushes targeting the default branch
push:
branches: ["main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Single deploy job since we're just deploying
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v5
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
# Upload entire repository
path: '.'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
374 changes: 86 additions & 288 deletions index.html
Original file line number Diff line number Diff line change
@@ -3,10 +3,10 @@
<head>
<meta charset="utf-8">
<meta name="description"
content="Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.">
content="ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling.">
<meta name="keywords" content="Nerfies, D-NeRF, NeRF">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Nerfies: Deformable Neural Radiance Fields</title>
<title>ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling</title>

<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-PYVRSFMDRL"></script>
@@ -32,7 +32,7 @@
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<link rel="icon" href="./static/images/logo.svg">

<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
@@ -42,130 +42,107 @@
</head>
<body>

<nav class="navbar" role="navigation" aria-label="main navigation">
<div class="navbar-brand">
<a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
<span aria-hidden="true"></span>
<span aria-hidden="true"></span>
<span aria-hidden="true"></span>
</a>
</div>
<div class="navbar-menu">
<div class="navbar-start" style="flex-grow: 1; justify-content: center;">
<a class="navbar-item" href="https://keunhong.com">
<span class="icon">
<i class="fas fa-home"></i>
</span>
</a>

<div class="navbar-item has-dropdown is-hoverable">
<a class="navbar-link">
More Research
</a>
<div class="navbar-dropdown">
<a class="navbar-item" href="https://hypernerf.github.io">
HyperNeRF
</a>
<a class="navbar-item" href="https://nerfies.github.io">
Nerfies
</a>
<a class="navbar-item" href="https://latentfusion.github.io">
LatentFusion
</a>
<a class="navbar-item" href="https://photoshape.github.io">
PhotoShape
</a>
</div>
</div>
</div>

</div>
</nav>


<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Nerfies: Deformable Neural Radiance Fields</h1>
<h1 class="title is-2 publication-title">ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling</h1>
<h4 class="subtitle is-5"><em>NeurIPS 2024 </em> </h4>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://keunhong.com">Keunhong Park</a><sup>1</sup>,</span>
<a >Quanwei Yang</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://guanjz20.github.io/">Jiazhi Guan</a><sup>2</sup>,</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=2Pedf3EAAAAJ">Kaisiyuan Wang</a><sup>3*</sup>,
</span>
<span class="author-block">
<a href="http://home.ustc.edu.cn/~yuly/">Lingyun Yu</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=1Ae0CMgAAAAJ&hl=en">Wenqing Chu</a><sup>3</sup>,
</span>
<span class="author-block">
<a href="https://utkarshsinha.com">Utkarsh Sinha</a><sup>2</sup>,</span>
<a href="https://hangz-nju-cuhk.github.io/">Hang Zhou</a><sup>3</sup>,
</span>
<span class="author-block">
<a href="https://jonbarron.info">Jonathan T. Barron</a><sup>2</sup>,
<a >Zhiqiang Feng</a><sup>3</sup>,
</span>
<span class="author-block">
<a href="http://sofienbouaziz.com">Sofien Bouaziz</a><sup>2</sup>,
<a href="https://scholar.google.com/citations?hl=zh-CN&user=pnuQ5UsAAAAJ&view_op=list_works&sortby=pubdate">Haocheng Feng</a><sup>3</sup>
</span>
<span class="author-block">
<a href="https://www.danbgoldman.com">Dan B Goldman</a><sup>2</sup>,
<a href="https://scholar.google.com/citations?user=1wzEtxcAAAAJ">Errui Ding</a><sup>3</sup>
</span>
<span class="author-block">
<a href="https://homes.cs.washington.edu/~seitz/">Steven M. Seitz</a><sup>1,2</sup>,
<a href="https://jingdongwang2017.github.io/">Jingdong Wang</a><sup>3</sup>
</span>
<span class="author-block">
<a href="http://www.ricardomartinbrualla.com">Ricardo Martin-Brualla</a><sup>2</sup>
<a href="https://faculty.ustc.edu.cn/xiehongtao/zh_CN/index.htm">Hongtao Xie</a><sup>1*</sup>
</span>

</div>

<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>University of Washington,</span>
<span class="author-block"><sup>2</sup>Google Research</span>
<span class="author-block"><sup>1</sup>University of Science and Technology of China,</span>
<span class="author-block"><sup>2</sup>Tsinghua University,</span>
<span class="author-block"><sup>3</sup>Department of Computer Vision Technology (VIS), Baidu Inc.</span>

</div>

<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<!-- <span class="link-block">
<a href="https://arxiv.org/pdf/2011.12948"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
</span> -->
<!-- <span class="link-block">
<a href="https://arxiv.org/abs/2011.12948"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
</span> -->
<!-- Video Link. -->
<span class="link-block">
<!-- <span class="link-block">
<a href="https://www.youtube.com/watch?v=MrKrnHhk8IA"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-youtube"></i>
</span>
<span>Video</span>
</a>
</span>
</span> -->
<!-- Code Link. -->
<span class="link-block">
<!-- <span class="link-block">
<a href="https://github.com/google/nerfies"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
</span> -->
<!-- Dataset Link. -->
<span class="link-block">
<!-- <span class="link-block">
<a href="https://github.com/google/nerfies/releases/tag/0.1"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="far fa-images"></i>
</span>
<span>Data</span>
</a>
</a> -->
</div>

</div>
@@ -175,79 +152,35 @@ <h1 class="title is-1 publication-title">Nerfies: Deformable Neural Radiance Fie
</div>
</section>



<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<video id="teaser" autoplay muted loop playsinline height="100%">
<source src="./static/videos/teaser.mp4"
type="video/mp4">
</video>

<img src="./static/images/pipeline_seth2.png" width="800px" height="500px"
type="application/pdf">
</img>

<h2 class="subtitle has-text-centered">
<span class="dnerf">Nerfies</span> turns selfie videos from your phone into
free-viewpoint
portraits.
The overview of our proposed framework ShowMaker.
</h2>
</div>

</div>
</section>


<section class="hero is-light is-small">
<div class="hero-body">
<div class="container">
<div id="results-carousel" class="carousel results-carousel">
<div class="item item-steve">
<video poster="" id="steve" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/steve.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-chair-tp">
<video poster="" id="chair-tp" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/chair-tp.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-shiba">
<video poster="" id="shiba" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/shiba.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-fullbody">
<video poster="" id="fullbody" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/fullbody.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-blueshirt">
<video poster="" id="blueshirt" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/blueshirt.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-mask">
<video poster="" id="mask" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/mask.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-coffee">
<video poster="" id="coffee" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/coffee.mp4"
type="video/mp4">
</video>
</div>
<div class="item item-toby">
<video poster="" id="toby" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/toby2.mp4"
type="video/mp4">
</video>
</div>
<source src="./static/videos/steve.mp4" type="video/mp4">
</div>
</div>
</div>
</section>
</section>






<section class="section">
@@ -258,31 +191,21 @@ <h2 class="subtitle has-text-centered">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
We present the first method capable of photorealistically reconstructing a non-rigidly
deforming scene using photos/videos captured casually from mobile phones.
Although significant progress has been made in human video generation, most previous studies focus on either human facial animation or full-body animation,
which cannot be directly applied to produce realistic conversational human videos with frequent hand gestures and various facial movements simultaneously.
</p>
<p>
Our approach augments neural radiance fields
(NeRF) by optimizing an
additional continuous volumetric deformation field that warps each observed point into a
canonical 5D NeRF.
We observe that these NeRF-like deformation fields are prone to local minima, and
propose a coarse-to-fine optimization method for coordinate-based models that allows for
more robust optimization.
By adapting principles from geometry processing and physical simulation to NeRF-like
models, we propose an elastic regularization of the deformation field that further
improves robustness.
To address these limitations, we propose a 2D human video generation framework, named ShowMaker, capable of generating high-fidelity half-body conversational
videos based on 2D key points via fine-grained diffusion modeling.
We leverage dual-stream diffusion models as the backbone of our framework and carefully design two novel components for crucial local regions (i.e., hands and face) that can
be easily integrated into our backbone.
Specifically, to handle the challenging hand generation caused by sparse motion guidance, we propose a novel Key Point-based Fine-grained Hand Modeling module by amplifying positional information from
raw hand key points and constructing a corresponding key point-based codebook.
Moreover, to restore richer facial details in generated results, we introduce a Face Recapture module, which extracts facial texture features and global identity features
from the aligned human face and integrates them into the diffusion process for face enhancement.
</p>
<p>
We show that <span class="dnerf">Nerfies</span> can turn casually captured selfie
photos/videos into deformable NeRF
models that allow for photorealistic renderings of the subject from arbitrary
viewpoints, which we dub <i>"nerfies"</i>. We evaluate our method by collecting data
using a
rig with two mobile phones that take time-synchronized photos, yielding train/validation
images of the same pose at different viewpoints. We show that our method faithfully
reconstructs non-rigidly deforming scenes and reproduces unseen views with high
fidelity.
Extensive quantitative and qualitative experiments demonstrate the superior visual quality and temporal consistency of our method
</p>
</div>
</div>
@@ -293,162 +216,37 @@ <h2 class="title is-3">Abstract</h2>
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Video</h2>
<div class="publication-video">
<iframe src="https://www.youtube.com/embed/MrKrnHhk8IA?rel=0&amp;showinfo=0"
frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
</div>
<!-- <div class="publication-video"> -->
<!-- <video poster="" id="paper-video" autoplay controls muted loop playsinline height="100%"> -->
<!-- <video width="800" height="600" controls>
<source src="https://www.youtube.com/watch?v=jWvAglDGWxA&t=1s" type="video/mp4">
</video> -->
<iframe width="800" height="480"
src="https://www.youtube.com/embed/Uw8CUv5d_YY?si=J7IG9QdlJF0e4wWy" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen>
</iframe>
<!-- </video> -->
<!-- </div> -->
</div>
</div>
<!--/ Paper video. -->
</div>
</section>


<section class="section">
<div class="container is-max-desktop">

<div class="columns is-centered">

<!-- Visual Effects. -->
<div class="column">
<div class="content">
<h2 class="title is-3">Visual Effects</h2>
<p>
Using <i>nerfies</i> you can create fun visual effects. This Dolly zoom effect
would be impossible without nerfies since it would require going through a wall.
</p>
<video id="dollyzoom" autoplay controls muted loop playsinline height="100%">
<source src="./static/videos/dollyzoom-stacked.mp4"
type="video/mp4">
</video>
</div>
</div>
<!--/ Visual Effects. -->

<!-- Matting. -->
<div class="column">
<h2 class="title is-3">Matting</h2>
<div class="columns is-centered">
<div class="column content">
<p>
As a byproduct of our method, we can also solve the matting problem by ignoring
samples that fall outside of a bounding box during rendering.
</p>
<video id="matting-video" controls playsinline height="100%">
<source src="./static/videos/matting.mp4"
type="video/mp4">
</video>
</div>

</div>
</div>
</div>
<!--/ Matting. -->

<!-- Animation. -->
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Animation</h2>

<!-- Interpolating. -->
<h3 class="title is-4">Interpolating states</h3>
<div class="content has-text-justified">
<p>
We can also animate the scene by interpolating the deformation latent codes of two input
frames. Use the slider here to linearly interpolate between the left frame and the right
frame.
</p>
</div>
<div class="columns is-vcentered interpolation-panel">
<div class="column is-3 has-text-centered">
<img src="./static/images/interpolate_start.jpg"
class="interpolation-image"
alt="Interpolate start reference image."/>
<p>Start Frame</p>
</div>
<div class="column interpolation-video-column">
<div id="interpolation-image-wrapper">
Loading...
</div>
<input class="slider is-fullwidth is-large is-info"
id="interpolation-slider"
step="1" min="0" max="100" value="0" type="range">
</div>
<div class="column is-3 has-text-centered">
<img src="./static/images/interpolate_end.jpg"
class="interpolation-image"
alt="Interpolation end reference image."/>
<p class="is-bold">End Frame</p>
</div>
</div>
<br/>
<!--/ Interpolating. -->

<!-- Re-rendering. -->
<h3 class="title is-4">Re-rendering the input video</h3>
<div class="content has-text-justified">
<p>
Using <span class="dnerf">Nerfies</span>, you can re-render a video from a novel
viewpoint such as a stabilized camera by playing back the training deformations.
</p>
</div>
<div class="content has-text-centered">
<video id="replay-video"
controls
muted
preload
playsinline
width="75%">
<source src="./static/videos/replay.mp4"
type="video/mp4">
</video>
</div>
<!--/ Re-rendering. -->

</div>
</div>
<!--/ Animation. -->


<!-- Concurrent Work. -->
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Related Links</h2>

<div class="content has-text-justified">
<p>
There's a lot of excellent work that was introduced around the same time as ours.
</p>
<p>
<a href="https://arxiv.org/abs/2104.09125">Progressive Encoding for Neural Optimization</a> introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.
</p>
<p>
<a href="https://www.albertpumarola.com/research/D-NeRF/index.html">D-NeRF</a> and <a href="https://gvv.mpi-inf.mpg.de/projects/nonrigid_nerf/">NR-NeRF</a>
both use deformation fields to model non-rigid scenes.
</p>
<p>
Some works model videos with a NeRF by directly modulating the density, such as <a href="https://video-nerf.github.io/">Video-NeRF</a>, <a href="https://www.cs.cornell.edu/~zl548/NSFF/">NSFF</a>, and <a href="https://neural-3d-video.github.io/">DyNeRF</a>
</p>
<p>
There are probably many more by the time you are reading this. Check out <a href="https://dellaert.github.io/NeRF/">Frank Dellart's survey on recent NeRF papers</a>, and <a href="https://github.com/yenchenlin/awesome-NeRF">Yen-Chen Lin's curated list of NeRF papers</a>.
</p>
</div>
</div>
</div>
<!--/ Concurrent Work. -->

</div>
</section>
<!-- <video width="320" height="240" controls>
<source src="videos/movie.mp4" type="video/new_sup_final_crf20.mp4">
Your browser does not support the video tag.
</video> -->


<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@article{park2021nerfies,
author = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
title = {Nerfies: Deformable Neural Radiance Fields},
journal = {ICCV},
year = {2021},
<pre><code>@article{yang2024showmaker,
author = {Quanwei Yang, Jiazhi Guan, Kaisiyuan Wang, Lingyun Yu, Wenqing Chu, Hang Zhou, Zhiqiang Feng, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie.},
title = {ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling},
journal = {NeurIPS},
year = {2024},
}</code></pre>
</div>
</section>
@@ -474,11 +272,11 @@ <h2 class="title">BibTeX</h2>
Commons Attribution-ShareAlike 4.0 International License</a>.
</p>
<p>
This means you are free to borrow the <a
<!-- This means you are free to borrow the <a
href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website,
we just ask that you link back to this page in the footer.
Please remember to remove the analytics code included in the header of the website which
you do not want on your website.
you do not want on your website. -->
</p>
</div>
</div>
652 changes: 652 additions & 0 deletions static/images/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/pipeline_seth.pdf
Binary file not shown.
Binary file added static/images/pipeline_seth2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.