Products
Products
Video Hosting
Upload and manage your videos in a centralized video library.
Image Hosting
Upload and manage all your images in a centralized library.
Galleries
Choose from 100+templates to showcase your media in style.
Video Messaging
Record, and send personalized video messages.
CincoTube
Create your own community video hub your team, students or fans.
Pages
Create dedicated webpages to share your videos and images.
Live
Create dedicated webpages to share your videos and images.
For Developers
Video API
Build a unique video experience.
DeepUploader
Collect and store user content from anywhere with our file uploader.
Solutions
Solutions
Enterprise
Supercharge your business with secure, internal communication.
Townhall
Webinars
Team Collaboration
Learning & Development
Creative Professionals
Get creative with a built in-suite of editing and marketing tools.
eCommerce
Boost sales with interactive video and easy-embedding.
Townhall
Webinars
Team Collaboration
Learning & Development
eLearning & Training
Host and share course materials in a centralized portal.
Sales & Marketing
Attract, engage and convert with interactive tools and analytics.
"Cincopa helped my Enterprise organization collaborate better through video."
Book a Demo
Resources
Resources
Blog
Learn about the latest industry trends, tips & tricks.
Help Centre
Get access to help articles FAQs, and all things Cincopa.
Partners
Check out our valued list of partners.
Product Updates
Stay up-to-date with our latest greatest features.
Ebooks, Guides & More
Customer Stories
Hear how we've helped businesses succeed.
Boost Campaign Performance Through Video
Discover how to boost your next campaign by using video.
Download Now
Pricing
Watch a Demo
Demo
Login
Start Free Trial
A CUDA-based video encoder using NVIDIA’s Video Codec SDK performs real-time H.264 encoding directly on the GPU. Raw frames in NV12 format are allocated in device memory with pitch alignment for optimal access. Frames are transferred asynchronously using CUDA streams and passed to the encoder through structured API calls. Output packets are collected and written to a file or passed to a muxer. All operations are synchronized to maintain consistent performance and low latency. Environment Requirements Before building the encoder, verify the following dependencies: CUDA Toolkit (version ≥ 11.0): Provides GPU programming tools including nvcc, CUDA runtime APIs, and device functions required for memory allocation and synchronization. NVIDIA GPU with NVENC support: Required to access the hardware encoder engine. Use GPUs from the Turing (RTX 20xx), Ampere, or later architectures for best performance. NVIDIA Video Codec SDK: Includes headers and C++ wrappers such as NvEncoderCuda, which simplifies interfacing with NVENC from CUDA applications. C++17-compatible compiler: Needed for standard C++ compilation and to build the NVENC wrapper interfaces. CUDA driver and runtime are properly installed: Ensure the driver version matches the required CUDA Toolkit version. Verify GPU support: nvidia-smi Look for Encoder and Decoder utilization fields to confirm NVENC capability. Raw Frame Preparation in NV12 Format The encoder accepts video frames in NV12 format: a Y plane followed by an interleaved UV plane. Memory should be GPU-allocated with correct pitch alignment to ensure optimal memory access. uint8_t* d_nv12;size_t pitch;cudaMallocPitch(&d_nv12, &pitch, width, height * 3 / 2); d_nv12: Pointer to device memory. pitch: Aligned memory row stride for coalesced access. height * 3/2: Accounts for full Y and half UV planes. Transfer NV12 frame from host to device: cudaMemcpy2D(d_nv12, pitch, h_nv12, width, width, height * 3 / 2, cudaMemcpyHostToDevice); d_nv12: Destination in device memory. h_nv12: Source frame in host memory. cudaMemcpyHostToDevice: Direction of memory copy. Use cudaHostAlloc() for pinned memory for better throughput. NVENC Encoder Initialization Use NvEncoderCuda from the SDK to interface with the NVENC API. Configure encoder settings via NV_ENC_INITIALIZE_PARAMS and NV_ENC_CONFIG . NvEncoderCuda encoder(cudaCtx, width, height, NV_ENC_BUFFER_FORMAT_NV12);NV_ENC_INITIALIZE_PARAMS initParams = { NV_ENC_INITIALIZE_PARAMS_VER };NV_ENC_CONFIG encConfig = { NV_ENC_CONFIG_VER };initParams.encodeGUID = NV_ENC_CODEC_H264_GUID;initParams.presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;initParams.encodeWidth = width;initParams.encodeHeight = height;initParams.frameRateNum = 30;initParams.enablePTD = 1;initParams.presetGUID = NV_ENC_PRESET_P3_GUID;initParams.encodeConfig = &encConfig;encoder.CreateEncoder(&initParams); encodeGUID: Sets codec to H.264. presetGUID: Controls latency/quality tradeoff. enablePTD: Enables picture-type decision logic. CreateEncoder(): Initializes encoding session. Frame Submission and Encoding Each frame is submitted to the encoder via the mapped input buffer. Copy the prepared NV12 frame into the encoder input and invoke EncodeFrame() . const NvEncInputFrame* inputFrame = encoder.GetNextInputFrame();NvEncoderCuda::CopyToDeviceFrame(cudaCtx, h_nv12, 0,(CUdeviceptr)inputFrame->inputPtr, inputFrame->pitch,width, height, CU_MEMORYTYPE_HOST);std::vector
> vPackets;encoder.EncodeFrame(vPackets); GetNextInputFrame(): Gets the next input slot in the buffer queue. CopyToDeviceFrame(): Transfers NV12 data to the encoder input. EncodeFrame(): Encodes the frame and populates vPackets with output. Handling the Encoded Output Encoded H.264 packets are written directly to a file or passed to a muxer (e.g., FFmpeg) for containerization. Call EndEncode() to flush delayed frames (B-frames) and ensure all data is processed. Each packet contains an NAL unit, which can be parsed for SPS/PPS headers or frame boundaries. for (auto& pkt : vPackets) {output_file.write((char*)pkt.data(), pkt.size());} pkt.data(): Pointer to byte array with encoded frame. pkt.size(): Number of bytes in packet. To flush remaining frames: encoder.EndEncode(vPackets); EndEncode(): Completes encoding and flushes delayed B-frames or lookahead frames. Bitrate and Quality Configuration Rate control modes like CBR (constant bitrate) or ConstQP (fixed quantization) are set via NV_ENC_CONFIG. For streaming, configure averageBitRate and vbvBufferSize to balance quality and bandwidth. Adjust QP values for I/P-frames to prioritize visual quality or compression efficiency. encConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CBR;encConfig.rcParams.averageBitRate = 4000000;encConfig.rcParams.maxBitRate = 4000000;encConfig.rcParams.vbvBufferSize = 4000000; rateControlMode: CBR ensures a consistent bitrate. averageBitRate: Sets target bitrate (in bits/sec). maxBitRate: Sets max allowable bitrate. vbvBufferSize: Controls bitrate variability smoothing. For fixed QP: encConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_CONSTQP;encConfig.rcParams.constQP.qpIntra = 23;encConfig.rcParams.constQP.qpInterP = 25; CONSTQP: Disables rate control; uses fixed quantizer. qpIntra: QP for I-frames. qpInterP: QP for P-frames. GPU Synchronization and Streams CUDA streams enable asynchronous memory transfers and concurrent execution. Use cudaMemcpy2DAsync() to overlap frame copies with encoding tasks. Synchronize streams with cudaStreamSynchronize() before encoding to ensure data availability. This minimizes idle GPU time and maximizes throughput. cudaStream_t stream;cudaStreamCreate(&stream); Perform asynchronous copy: cudaMemcpy2DAsync(d_nv12, pitch, h_nv12, width,width, height * 3 / 2, cudaMemcpyHostToDevice, stream); cudaMemcpy2DAsync: Starts a non-blocking memory transfer. stream: Specifies the CUDA stream used for copy. Synchronize before encoding: cudaStreamSynchronize(stream); Ensures the transfer is complete before using d_nv12 in encoding. Cleanup and Resource Release Destroy the encoder session with DestroyEncoder() to free NVENC resources. Release GPU memory allocated for frames using cudaFree() and destroy CUDA streams with cudaStreamDestroy(). Proper cleanup prevents memory leaks and ensures stability in long-running applications. encoder.DestroyEncoder();cudaFree(d_nv12);cudaStreamDestroy(stream); DestroyEncoder(): Releases encoder handles and GPU memory. cudaFree(): Frees pitched NV12 buffer. cudaStreamDestroy(): Destroys the CUDA stream context.