Memory Usage Over Capacity error in Stream Analytics

h85824 0 Reputation points
2025-03-11T12:30:33.31+00:00

Hi Team, I have stream analytics setup which is consuming data from an EventHub, I am constantly getting the below error message :

The memory usage is over the capacity for one or more of the query steps. Event processing may be delayed or stop making progress. This may be a result of large window in your query, large events in your input, large out of order tolerance window, or a combination of the above. Please try to partition your query, or break down your query to more steps, and add Streaming Unit resources from the Scale tab to avoid such condition.,

Sharing details of stream analytics Details Below :Screenshot 2025-03-11 at 8.25.40 AM Screenshot 2025-03-11 at 8.22.48 AM Query :

WITH OneEvent AS 
(
    SELECT COUNT(*) As eventCount, System.Timestamp as EndOfWindow
    FROM [inEventHub] TIMESTAMP BY EventEnqueuedUtcTime
    GROUP BY TumblingWindow(hh, 60)  -- Reduced window size
),
AllReferenceDevices AS 
(
    SELECT b.machine_id as mac_address, OneEvent.EndOfWindow
    FROM OneEvent JOIN [prodBlobReference] b
    ON OneEvent.EndOfWindow = OneEvent.EndOfWindow
),
DeviceConnectivityErrorDetection AS
(
    SELECT 'DeviceConnectivityErrorDetected' AS EventType, AllReferenceDevices.mac_address as mac_address, AllReferenceDevices.EndOfWindow as LastEventEnqueuedUtcTime
    FROM AllReferenceDevices
    LEFT JOIN [inEventHub] TIMESTAMP BY EventEnqueuedUtcTime 
    ON DATEDIFF(hh, [inEventHub], AllReferenceDevices) BETWEEN 0 and 60  -- Adjusted time difference
    AND [inEventHub].mac_address = AllReferenceDevices.mac_address
    WHERE [inEventHub] IS NULL 
)
SELECT EH.mac_address, EH.res, EH.parms
INTO [queueoutput1]
FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
JOIN prodBlobReference b 
ON b.machine_id=EH.mac_address AND b.queue_name='prod-queue1'

SELECT EH.mac_address, EH.res, EH.parms
INTO [queueoutput2]
FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
JOIN prodBlobReference b 
ON b.machine_id=EH.mac_address AND b.queue_name='prod-queue2'

SELECT EH.mac_address, EH.res, EH.parms
INTO [queueoutput3]
FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
JOIN prodBlobReference b 
ON b.machine_id=EH.mac_address AND b.queue_name='prod-queue3'

SELECT *
INTO [queueAnomalyOutput]
FROM DeviceConnectivityErrorDetection

Can you help mw identify what can be the potential reason for this error.

Thanks.

Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
390 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. J N S S Kasyap 1,715 Reputation points Microsoft External Staff
    2025-03-13T00:08:40.65+00:00

    Hi @h85824

    Azure Stream Analytics query is encountering memory overcapacity issues, likely due to inefficient query design, large window sizes, and redundant joins.

    key issues and practical solutions to optimize performance and reduce memory usage.

    A 60-minute window accumulates too much data, increasing memory usage. Reduce it to 15 or 30 minutes to improve efficiency

    The condition ON OneEvent.EndOfWindow = OneEvent.EndOfWindow is always true, making the join ineffective.Use an actual timestamp column from prodBlobReference.

    Checking missing events over 60 hours (DATEDIFF(hh, ..., 60)) leads to high memory consumption. Reduce it to 12 hours to optimize memory and processing the query.

    prodBlobReference is joined three times in different queries, increasing redundancy. Filter first using WHERE instead of filtering inside the JOIN.

    Improved Query:

    WITH OneEvent AS 
    (
        SELECT COUNT(*) AS eventCount, System.Timestamp AS EndOfWindow
        FROM [inEventHub] TIMESTAMP BY EventEnqueuedUtcTime
        GROUP BY TumblingWindow(mi, 15)  -- Efficient 15-min window
    ),
    AllReferenceDevices AS 
    (
        -- CROSS JOIN to ensure all devices are considered
        SELECT b.machine_id AS mac_address, OneEvent.EndOfWindow
        FROM OneEvent 
        CROSS JOIN [prodBlobReference] b
    ),
    DeviceConnectivityErrorDetection AS
    (
        SELECT 'DeviceConnectivityErrorDetected' AS EventType, 
               AllReferenceDevices.mac_address AS mac_address, 
               AllReferenceDevices.EndOfWindow AS LastEventEnqueuedUtcTime
        FROM AllReferenceDevices
        WHERE NOT EXISTS 
        (
            SELECT 1 
            FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
            WHERE EH.mac_address = AllReferenceDevices.mac_address
            AND ABS(DATEDIFF(hh, EH.EventEnqueuedUtcTime, AllReferenceDevices.EndOfWindow)) <= 12
        )
    )
    -- Consolidated joins for efficiency
    SELECT EH.mac_address, EH.res, EH.parms
    INTO [queueoutput1]
    FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
    JOIN [prodBlobReference] b 
    ON EH.mac_address = b.machine_id
    WHERE b.queue_name = 'prod-queue1'
    SELECT EH.mac_address, EH.res, EH.parms
    INTO [queueoutput2]
    FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
    JOIN [prodBlobReference] b 
    ON EH.mac_address = b.machine_id
    WHERE b.queue_name = 'prod-queue2'
    SELECT EH.mac_address, EH.res, EH.parms
    INTO [queueoutput3]
    FROM [inEventHub] EH TIMESTAMP BY EventEnqueuedUtcTime
    JOIN [prodBlobReference] b 
    ON EH.mac_address = b.machine_id
    WHERE b.queue_name = 'prod-queue3'
    -- Store detected connectivity errors
    SELECT * 
    INTO [queueAnomalyOutput]
    FROM DeviceConnectivityErrorDetection
    
    

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.