Spark SQL 3.2.0
Please see the above link for a simplified example of my dataset and desired outcome. In abstract, I have a dataset with a series of related events that can be grouped by their time order and event number. When ordering by time and event number, every time the event number resets to 1, you’re looking at a new set of events. I understand how to use row_number()
or dense_rank()
to increment event_group_number
where sub_event_number = 1
, but I’m uncertain how to make the rows where sub_event_number > 1
take on the correct event_group_number
.
I’m currently doing the following:
case
when sub_event_number = 1 and is_event_type
then row_number() over (partition by context_id, event_id, sub_event_number order by is_event_type asc, start_time asc) - 1
else null
end as event_group_number
I’d be grateful for any help, and I’m happy to answer any questions. Thank you!