dagadgets.co.uk

Apple Releases Depth Pro, an Open Source Monocular Depth Estimation AI Model


Apple

has

released

several
open-source
artificial
intelligence
(AI)
models
this
year.
These
are
mostly
small
language
models
designed
for
a
specific
task.
Adding
to
the
list,
the
Cupertino-based
tech
giant
has
now
released
a
new
AI
model
dubbed
Depth
Pro.
It
is
a
vision
model
that
can
generate
monocular
depth
maps
of
any
image.
This
technology
is
useful
in
the
generation
of
3D
textures,
augmented
reality
(AR),
and
more.
The
researchers
behind
the
project
claim
that
the
depth
maps
generated
by
AI
are
better
than
the
ones
generated
with
the
help
of
multiple
cameras.

Apple
Releases
Depth
Pro
AI
Model

Depth
estimation
is
an
important
process
in
3D
modelling
as
well
as
various
other
technologies
such
as
AR,
autonomous
driving
systems,
robotics,
and
more.
The
human
eye
is
a
complex
lens
system
that
can
accurately
gauge
the
depth
of
objects
even
while
observing
them
from
a
single-point
perspective.
However,
cameras
are
not
that
good
at
it.
Images
taken
with
a
single
camera
make
it
appear
two-dimensional,
removing
depth
from
the
equation.

So,
for
technologies
where
the
depth
of
an
object
plays
an
important
role,
multiple
cameras
are
used.
However,
modelling
objects
like
this
can
be
time-consuming
and
resource-intensive.
Instead,
in
a

research
paper

titled
“Depth
Pro:
Sharp
Monocular
Metric
Depth
in
Less
Than
a
Second”,
Apple
highlighted
how
it
used
a
vision-based
AI
model
to
generate
zero-shot
depth
maps
of
monocular
images
of
objects.


How
the
Depth
Pro
AI
model
generates
depth
maps

Photo
Credit:
Apple

To
develop
the
AI
model,
the
researchers
used
the
Vision
Transformer-based
(ViT)
architecture.
The
output
resolution
of
384
x
384
was
picked,
but
the
input
and
processing
resolution
was
kept
at
1536
x
1536,
allowing
the
AI
model
more
space
to
understand
the
details.

In
the
pre-print
version
of
the
paper,
which
is
currently
published
in
the
online
journal
arXiv,
the
researchers
claimed
that
the
AI
model
can
now
accurately
generate
depth
maps
of
visually
complex
objects
such
as
a
cage,
a
furry
cat’s
body
and
whiskers,
and
more.
The
generation
time
is
said
to
be
one
second.
The
weights
of
the
open-source
AI
model
are
currently
being
hosted
on
a
GitHub

listing
.
Interested
individuals
can
run
the
model
on
the
inference
of
a
single
GPU.

Exit mobile version