UKC Forums - SQL Again

SQL Again

This topic has been archived, and won't accept reply postings.

lowersharpnose 18 Sep 2014

You were good to me last time.

I have a table I use to record instrument readings or some such.

reading_id (table row id)
instrument_id
<more columns>
instrument_reading_datetime
instrument_reading

I want to prune it, so it only shows when the instrument_reading changes.
e.g. ordered by datetime, we have values like this...

100, 200, ... , 2014/01/01, 17
101, 200, ... , 2014/01/02, 18
112, 200, ... , 2014/01/03, 18 -- delete
103, 200, ... , 2014/01/04, 18 -- delete
124, 200, ... , 2014/01/05, 18 -- delete
205, 200, ... , 2014/01/06, 19
206, 200, ... , 2014/01/07, 19 -- delete
207, 200, ... , 2014/01/08, 19 -- delete
208, 200, ... , 2014/01/09, 19 -- delete
109, 200, ... , 2014/01/10, 19 -- delete
110, 200, ... , 2014/01/11, 17
111, 200, ... , 2014/01/12, 17 -- delete
112, 200, ... , 2014/01/13, 17 -- delete
113, 200, ... , 2014/01/14, 17 keep (last in series)

Can I do this with a single query?

alibaba 18 Sep 2014

select * from tbl;

1 NULL 2 100
2 NULL 2 101
3 NULL 2 101
4 NULL 2 110
5 NULL 2 110
6 NULL 2 120
7 NULL 2 120
8 NULL 2 150
9 NULL 2 170

delete r1 from tbl r1, tbl r2 where r1.id>r2.id and r1.reading = r2.reading;

select * from tbl;

1 NULL 2 100
2 NULL 2 101
4 NULL 2 110
6 NULL 2 120
8 NULL 2 150
9 NULL 2 170

Make sure you check if the meter is the same so you don't delete the same reading from two different meters.

splat2million 18 Sep 2014

In reply to lowersharpnose:

It would be very straightforward to do with multiple single queries within a while loop!

SELECT DISTINCT would help if the readings couldn't change back to an old one. Perhaps if you can recode the way the data is collected so you have a new column instrument_reading_cumulative (i.e. 17, 35, 35, 35, 35, 54, 54, 54, etc.)?

OP lowersharpnose 18 Sep 2014

In reply to alibaba:

Wouldn't that delete get rid of the row

110, 200, ... , 2014/01/11, 17

OP lowersharpnose 18 Sep 2014

In reply to splat2million:

The readings can change back to an old one.

I was assuming that I would have to write a program to do the job, but just wondered if I was missing a trick.

As for rewriting the way it is stored, well I can do that, but

a) I would prefer not to change that bit.

b) The data exists and I have to prune what is there already.

alibaba 18 Sep 2014

In reply to lowersharpnose:

No, as you can see in my example record with id 1 remained.

OP lowersharpnose 18 Sep 2014

In reply to alibaba:

Your suggestion works for your data, not for mine.

If you:

CREATE TABLE readings
(
id int auto_increment primary key,
reading int
);

INSERT INTO readings
(reading)
VALUES
(10),(10),(10),(11),(11),(11),(11),(12),(11),(11),(11),(11),
(12),(12),(12),(12),(11);

Then run your suggestion:
delete r1 from readings r1, readings r2 where r1.id>r2.id and r1.reading = r2.reading;

select * from readings order by id;

Gives
1 10
4 11
8 12

... which wrongly (for my purposes) got rid of
9 11
13 12
17 11

I am still looking.

philipivan 18 Sep 2014

In reply to lowersharpnose:
What database server are you using. In recent versions of oracle you can use lag to return the previous row(s) you can compare this to the current row and see if a change has occurred. If it hasn't you can ignore that row - although you may need a second query surrounding the analytic.

Phil

Something like

select col_date,col_value
from (
select col_date,
col_value,
decode(lag(col_value,1,0) over (order by col_date asc) - col_value,0,'N','Y') as changed
from data_imp)
where changed='Y'

Post edited at 12:27

OP lowersharpnose 18 Sep 2014

In reply to philipivan:

MySQL.

I think I will write a program.

philipivan 18 Sep 2014

In reply to lowersharpnose:

See my update about. If you have a choice you should consider using postgres sql or oraclexe. You'll get a load more functionality for free.

You could do the same in mysql but it would involve writing a subquery returning the previous row value in this case.

Alex@home 18 Sep 2014

In reply to lowersharpnose:

alibaba's just needs a small tweak. you want to look to see if the next record changes - not if any subsequent record has a value you've already seen.
think this should do what you want:

delete readings
where id in (select r1.id
from readings r1
, readings r2
where r1.id = r2.id+1
and r1.reading = r2.reading
)

Alex@home 18 Sep 2014

In reply to philipivan:

oracle lag functions was my first thought but it seems you can't use them in a delete statement

highaltitudebarista 18 Sep 2014

In reply to lowersharpnose:
I live in the SQL Server world where I would do something like the below. It looks like you don't get ranking functions in MySQL but there is a ranking variable which might help you do a similar thing?

My logical approach is that for each instrument_id/instrument_reading pairing you want to delete all but the record with the lowest instrument_reading_datetime value.

;WITH x AS
(
SELECT
reading_id.
instrument_id,
instrument_reading_datetime,
instrument_reading,
RANK() OVER(PARTITION BY instrument_id, instrument_reading ORDER BY instrument_reading_datetime ASC) Ranking

FROM readingtable
)

DELETE rt
FROM readingtable rt
LEFT JOIN x ON x.Ranking = 1 x.reading_id = rt.reading_id
WHERE x.reading_id IS NULL

Post edited at 16:44

OP lowersharpnose 18 Sep 2014

In reply to Alex@home:

It doesn't but could with another step. The table holds info on lots of instruments, in addition, the readings do not get inserted in order --> you can't use the primary key id for any usefule comparison. The datetime at which the readings was taken is what I need to use to order readings from a given instrument. I think I could just do that first.

Thanks to all for you help.

I wrote a short python script to do the looping and it has almost finished churning through the millions of rows.