Last two days I have tried to wrap my head around the ACES thing and how it works in Davinci Resolve. ACES was introduced in Resolve 8.something and is now an alternative to native YRGB math. I probably first read about ACES from project Mango blog because Mango was shot with Sony F65 which is meant to support ACES workflow. For Mango (or Tears of Steel as they now call it), they converted the thing from SLog to ACES openexr files and from there on to rec709 which they use for compositing etc. At first I didn't pay much attention to this ACES thing but when I found it from Resolve I decided to dig a little deeper and find out what this whole thing is about.
Basically, ACES is a color gamut that is meant to be so wide that you cannot possibly go beyond it even with extreme grading or transforms. As I understand it, it expresses colors in XYZ space which is device-independent and thus allows to describe all possible colors. Images in ACES colorspace should be scene-referred (as opposed to output-referred/display-referred) which means that color values describe actual light levels in the scene that the camera or other captude device received.
EDIT: ACES has RGB primaries associated with it and thus is a theoretically limited color space. It, however, contains all colors visible to human eye and so is unlimited in any practical terms. ACES values are relative scene exposure values - they express real light level ratios in filmed scene.
To achieve the conversion to scene-referred values, something called Input Device Transform is performed. It is a transformation that is specific to imaging device (for example the sensor of certain camera model) and converts the values from recorded camera data to scene exposure values. This transform must be camera and recording method specific because cameras tend to apply different log or gamma curves to map the most valuable data to limited bit depths. IDT reverses these curves and sensor response to calculate real light levels that hit the sensor. It is a bit unclear to me if this should happen in IDT or is some additional 1D or 3D transform needed...
After transforming camera data to scene referred values, something called Reference Rendering Transform is performed. I haven't quite figured out how it works in Resolve but as I understand it, it is basically the emulation of ideal film print that has the pleasing qualities of film. How this transform is constructed and what are the pleasing qualities that it tries to achieve are, is fuzzy to me at the moment. It is meant to replace the current practice of release print film stock emulation LUTs that are used in grading.
EDIT: Reference Rendering Transform applies manipulations that make image look perceptually pleasing. As films stocks have been developed to look pleasing also with some colors being more saturated than others, comparison of RRT with ideal film stock is logical.
For viewing the "ideal film stock" on screen, Output Device Transform is performed. ODT transforms the image to whatever output device you are viewing the image on. Without RRT and ODT you can not display ACES data. ODT is device specific and maps the ideal image to device gamut and gamma. For example if you view the image on P3 DCI projector the ODT makes sure you get the right mapping to the gamut used in projector. Same with rec709 or any other way of displaying images.
In addition to IDT and ODT, operations called Input Device Calibration Transform and Output "thesamething" can be performed. Calibration transforms are meant to level the differences between different models of same device. For example two different Alexa cameras might have slightly different sensor response curves and IDCT is meant to remove that difference between cameras. These transforms are device specific and are not shared. Probably they won't be used much because to construct such a transform you have to run your cameras and output devices through some heavy measurements.