Difference between revisions of "CHIRTS FAQ"
Line 1: | Line 1: | ||
==How CHIRTs is generated== | ==How CHIRTs is generated== | ||
− | Just before we operationalized the IRTmax process, Seth was working in the | + | Just before we operationalized the IRTmax process, Seth was working in the Western US. |
− | There he noticed some years where the geo-registration could be off by quite a bit. | + | There he noticed some years where the geo-registration could be off by quite a bit. Most noticeably with B1V1, but B1V2 data had some variability in pixel location on the order of a few pixels. |
He decided to avoid years, | He decided to avoid years, | ||
− | 1980, 1981, 1983, 1989,1990, 1991, 1992, 1993, 1994 and 1995 | + | 1980, 1981, 1983, 1989, 1990, 1991, 1992, 1993, 1994 and 1995 |
There are 25 good years left between 1982 and 2014. | There are 25 good years left between 1982 and 2014. | ||
− | The attached figure shows the various Geostationary sensors in use over the equator. The problem area for the US SW coincides with the GOES-7 period when there was only a single GOES for the US, so it had to cover both coasts. | + | The attached figure shows the various Geostationary sensors in use over the equator. The problem area for the US SW coincides with the GOES-7 period when there was only a single GOES for the US, so it had to cover both coasts. How to upload a figure????????? |
This should have no bearing on Africa which has consistent coverage when it has any, just wanted you to know why we selected these 25 years to make a climatology. | This should have no bearing on Africa which has consistent coverage when it has any, just wanted you to know why we selected these 25 years to make a climatology. | ||
Line 14: | Line 14: | ||
Start by making climatologies of expected brightness temperature for each pixel for each month 8 times a day. | Start by making climatologies of expected brightness temperature for each pixel for each month 8 times a day. | ||
− | Eight | + | Eight times a day, at each pixel, look at a 5x5 array centered on it over 25 good years over the calendar month. |
− | + | The first cut is to exclude any points colder than 180 K or warmer than 340 K. | |
− | Look at the histogram of all points (binned every 3 degrees to avoid spurious maxima), | + | Look at the histogram of all points (binned every 3 degrees to avoid spurious maxima). The goal is figure out a mathematical way to trim the high and low outliers so that we are identifying actual anomalies from actual data rather than identifying cloudiness or misregistration or sensor calibration errors, etc. A histogram of all of the temperatures for a given pixel for a given month for a given hour will have a really long tail on the low end, and then a near gaussian distribution of good temperatures at the high end (cloud tops are colder than land). the low end is all of the solidly cloudy to partly cloudy data values. My first cut at this was to fit a gaussian to the entire distribution (which was dominated by the gaussian good values) and only keep values +/- 3 standard deviations from the mean. this works fine for africa as long as the amount of good values is >> the amount of cloudy values. for the western US, especially in winter, the overall distributions become distinctly non-gaussian, because there are so many cloudy data points. The second cut at this seeks to isolate the gaussian part of the distribution by finding the upper threshold (SD +3) and center point of the gaussian part (mean of the histogram) and then calculating the lower threshold (SD -3) as (center - (thresh High - center)). |
+ | *we find the center by looking for the first local maxima when you start looking from the right side of the distribution. we do this because there can be bimodal distributions with the lower temperature peak being the erroneous one. | ||
− | + | *to find the +3 S.D. upper threshold i calculate the 99th percentile of the entire distribution. There are some tricks here. you have to trim erroneous high values before calculating the 99th or else you will have a really high thresh H. I isolate the true end of the histogram by looking for bins having >10 counts in them (4 in a row) then 2 bins with counts of zero. in the cloudy tail there can be bins with values then no values, then values again, so if there are multiple instances of the xxxxoo pattern I take the higher instance. This works fine for most of the world. When Pete ran the code on the world from 70N to 70S there were some errors in Siberia, Greenland, Antarctica | |
+ | |||
+ | so we have a symmetric, gaussian like distribution of good values. getting the clouds out is incredibly important, obviously, and this works pretty well. | ||
Use the maxima found and mirror the right hand side to the left hand side of the maxima. This acts to some degree as a cloud mask. We are left with a 4 dimensional array, | Use the maxima found and mirror the right hand side to the left hand side of the maxima. This acts to some degree as a cloud mask. We are left with a 4 dimensional array, |
Revision as of 15:39, 22 January 2016
How CHIRTs is generated
Just before we operationalized the IRTmax process, Seth was working in the Western US. There he noticed some years where the geo-registration could be off by quite a bit. Most noticeably with B1V1, but B1V2 data had some variability in pixel location on the order of a few pixels. He decided to avoid years,
1980, 1981, 1983, 1989, 1990, 1991, 1992, 1993, 1994 and 1995
There are 25 good years left between 1982 and 2014.
The attached figure shows the various Geostationary sensors in use over the equator. The problem area for the US SW coincides with the GOES-7 period when there was only a single GOES for the US, so it had to cover both coasts. How to upload a figure?????????
This should have no bearing on Africa which has consistent coverage when it has any, just wanted you to know why we selected these 25 years to make a climatology.
Start by making climatologies of expected brightness temperature for each pixel for each month 8 times a day.
Eight times a day, at each pixel, look at a 5x5 array centered on it over 25 good years over the calendar month. The first cut is to exclude any points colder than 180 K or warmer than 340 K.
Look at the histogram of all points (binned every 3 degrees to avoid spurious maxima). The goal is figure out a mathematical way to trim the high and low outliers so that we are identifying actual anomalies from actual data rather than identifying cloudiness or misregistration or sensor calibration errors, etc. A histogram of all of the temperatures for a given pixel for a given month for a given hour will have a really long tail on the low end, and then a near gaussian distribution of good temperatures at the high end (cloud tops are colder than land). the low end is all of the solidly cloudy to partly cloudy data values. My first cut at this was to fit a gaussian to the entire distribution (which was dominated by the gaussian good values) and only keep values +/- 3 standard deviations from the mean. this works fine for africa as long as the amount of good values is >> the amount of cloudy values. for the western US, especially in winter, the overall distributions become distinctly non-gaussian, because there are so many cloudy data points. The second cut at this seeks to isolate the gaussian part of the distribution by finding the upper threshold (SD +3) and center point of the gaussian part (mean of the histogram) and then calculating the lower threshold (SD -3) as (center - (thresh High - center)).
- we find the center by looking for the first local maxima when you start looking from the right side of the distribution. we do this because there can be bimodal distributions with the lower temperature peak being the erroneous one.
- to find the +3 S.D. upper threshold i calculate the 99th percentile of the entire distribution. There are some tricks here. you have to trim erroneous high values before calculating the 99th or else you will have a really high thresh H. I isolate the true end of the histogram by looking for bins having >10 counts in them (4 in a row) then 2 bins with counts of zero. in the cloudy tail there can be bins with values then no values, then values again, so if there are multiple instances of the xxxxoo pattern I take the higher instance. This works fine for most of the world. When Pete ran the code on the world from 70N to 70S there were some errors in Siberia, Greenland, Antarctica
so we have a symmetric, gaussian like distribution of good values. getting the clouds out is incredibly important, obviously, and this works pretty well.
Use the maxima found and mirror the right hand side to the left hand side of the maxima. This acts to some degree as a cloud mask. We are left with a 4 dimensional array,
31 days, 25 years, 5, 5
take the maximum over the 5x5 neighborhoods,
then the median over 25 years, then the median over all days in the month.
This results in 8 maps (00, 03, 06, 09, 12 ,15, 18 and 21) for each of the B1 observations in a day
Now we go back to each day (8 maps). at each point take the maximum of the 8 clim maps, maxclim.
for each of the 8 obs/day take an anomaly, Tb - maxclim
take the maximum of the anomaly over the day, maxanom
add these two together to get IRTmax for the day,
IRTmax = maxclim + maxanom.
The idea here is that if it is cloudy at noon, but clear so other part of the day, we use the clear values to estimate IRTmax.
This is run for all years, 1980 to 2015
That's a very rough draft of what we do.
All of this is up for discussion/improvements.
Daily SADC .bils are ready for Jan-April and July and August here,
ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/experimental/Tmax/daily/SADC.bils/
Will fill in the rest of the months as they become available.
These are hot off the press. No one has taken a look at these yet.... Have fun.
Fixes to make
one of the first steps in the process involves generating a histogram, per-pixel, of all of the Tb values through time. then we identify erroneous high values that would lead to a higher than correct upper threshold. the first way we did this was to locate the end of the real histogram by finding 4 good values in a row followed by two zeroes for the bins. this breaks down in Siberia/Greenland/Antarctica and if the xxxxoo search string isn't found the histogram isn't modified. I added 2 if statements to clarify things. if no xxxx00 then allow xxxxo. if not that, allow xxxxoo where x just has to be >0 rather than >10.
where code resides in Linux
Hi Seth,
My idl code for IRTmax lives here,
/home/source/pete/IRT/seth-tmax
The month cubes by hour live here,
/home/sandbox3/B1-GridSat/tifs/monthcubes/
IRTmax daily GeoTiffs, (1/2 Tb)
/home/sandbox3/B1-GridSat/tifs/adjTmax.daily
IRTmax monthly Geotiffs,
/home/IRT/Tmax/monthly
with subdirs,
/africa /anom
various IRTmax pngs, (stdev, counts, more to come.....)
/home/IRT/pngs/monthlyIRTmax
There you have it.
Take care,
-pete