I blogged before about the study by David Spence and colleagues, published online in July 2012 in the journal Atherosclerosis (). This study attracted a lot of media attention (e.g., ). The article is titled: “Egg yolk consumption and carotid plaque”. The study argues that “regular consumption of egg yolk should be avoided by persons at risk of cardiovascular disease”. It hints at egg yolks being unhealthy in general, possibly even more so than cigarettes.
I used the numbers in Table 2 of the article (only 5 rows of data, one per quintile; i.e., N=5) to conduct a type of analysis that is rarely if ever conducted in health studies – a moderating effects analysis. A previous blog post summarizes the results of one such analysis using WarpPLS (). It looked into the effect of the number of eggs consumed per week on the association between blood LDL cholesterol and plaque (carotid plaque). The conclusion, which is admittedly tentative due to the small sample (N=5), was that plaque decreased as LDL cholesterol increased with consumption of 2.3 eggs per week or more ().
Recently I ran an analysis on the moderating effect of number of eggs consumed per week on the association between cumulative smoking (measured in “pack years”) and plaque. As it turns out, if you fit a 3D surface to the five data points that you get for these three variables from Table 2 of the article, you end up with a relatively smooth surface. Below is a 3D plot of the 5 data points, followed by a best-fitting 3D surface (developed using an experimental algorithm).
Based on this best-fitting surface you could then generate a contour graph, shown below. The “lines” are called “isolines”. Each isoline refers to plaque values that are constant for a set of eggs per week and cumulative smoking combinations. Next to the isolines are the corresponding plaque values. The first impression is indeed that both egg consumption and smoking are causing plaque buildup, as plaque clearly increases as one moves toward the top-right corner of the graph.
But focus your attention on each individual isoline, one at a time. It is clear that plaque remains constant for increases in cumulative smoking, as long as egg consumption increases. Take for example the isoline that refers to 120 mm2 of plaque area. An increase in cumulative smoking from about 14.5 to 16 pack years leads to no increase in plaque if egg consumption goes up from about 2 to 2.3 eggs per week.
These within-isoline trends, which are fairly stable across isolines (they are all slanted to the right), clearly contradict the idea that eggs cause plaque buildup. So, why does plaque buildup seem to clearly increase with egg consumption? Here is a good reason: egg consumption is very strongly correlated with age, and plaque increases with age. The correlation is a whopping 0.916. And I am not talking about cumulative egg consumption, which the authors also measure, through a variable called “egg-yolk years”. No, I am talking about eggs per week. In this dataset, older folks were eating more eggs, period.
The correlation between plaque and age is even higher: 0.977. Given this, it makes sense to look at individual isolines. This would be analogous to what biostatisticians often call “adjusting for age”, or analyzing the effect of egg consumption on plaque buildup “keeping age constant”. A different technique is to “control for age”; this technique would be preferable had the correlations been lower (say, lower than 0.7), as collinearity levels might have been below acceptable thresholds.
The underlying logic of the “keeping age constant” technique is fairly sound in the face of such a high correlation, which would make “controlling for age” very difficult due to collinearity. When we “keep age constant”, the results point at egg consumption being protective among smokers.
But diehard fans of the idea that eggs are unhealthy could explain the results differently. Maybe egg consumption causes plaque to go up, but smoking has a protective effect. Again taking the isoline that refers to 120 mm2 of plaque area, these diehard fans could say that an increase in egg consumption from 2 to 2.3 eggs per week leads to no increase in plaque if cumulative smoking goes up from about 14.5 to 16 pack years.
Not too long ago I also blogged about a medical case study of a man who ate approximately 25 eggs (20 to 30) per day for over 15 years (probably well over), was almost 90 years old (88) when the case was published in the prestigious The New England Journal of Medicine, and was in surprisingly good health (). This man was not a smoker.
Perhaps if this man smoked 25 cigarettes per day, and ate no eggs, he would be in even better health eh!?
Showing posts with label cardiovascular disease. Show all posts
Showing posts with label cardiovascular disease. Show all posts
Monday, December 24, 2012
Monday, October 1, 2012
The anatomy of a VAP test report
The vertical auto profile (VAP) test is an enhanced lipid profile test. It has been proposed, chiefly by the company Atherotech (), as a more complete test that relies on direct measurement of previously calculated lipid measures. The VAP test is particularly known for providing direct measurements of LDL cholesterol, instead of calculating them through equations ().
At the time of this writing, a typical VAP test report would provide direct measures of the cholesterol content of LDL, Lp(a), IDL, HDL, and VLDL particles. It would also provide additional measures referred to as secondary risk factors, notably particle density patterns and apolipoprotein concentrations. Finally, it would provide a customized risk summary and some basic recommendations for treatment. Below is the top part of a typical VAP test report (from Atherotech), showing measures of the cholesterol content of various particles. LDL cholesterol is combined for four particle subtypes, the small-dense subtypes 4 and 3, and the large-buoyant subtypes 2 and 1. A breakdown by LDL particle subtype is provided later in the VAP report.
In the table above, HDL cholesterol is categorized in two subtypes, the small-dense subtype 2, and the large-buoyant subtype 3. Interestingly, most of the HDL cholesterol in the table is supposedly of the least protective subtype, which seems to be a common finding in the general population. VLDL cholesterol is categorized in a similar way. IDL stands for intermediate-density lipoprotein; this is essentially a VLDL particle that has given off some of its content, particularly its triglyceride (or fat) cargo, but still remains in circulation.
Lp(a) is a special subtype of the LDL particle that is purported to be associated with markedly atherogenic factors. Mainstream medicine generally considers Lp(a) particles themselves to be atherogenic, which is highly debatable. Among other things, cardiovascular disease (CVD) risk and Lp(a) concentration follow a J-curve pattern, and Lp(a)’s range of variation in humans is very large. A blog post by Peter (Hyperlipid) has a figure right at the top that illustrates the former J-curve assertion (). The latter fact, related to range of variation, generally leads to a rather wide normal distribution of Lp(a) concentrations in most populations; meaning that a large number of individuals tend to fall outside Lp(a)’s optimal range and still have a low risk of developing CVD.
Below is the middle part of a typical VAP report, showing secondary risk factors, such as particle density patterns and apolipoprotein concentrations. LDL particle pattern A is considered to be the most protective, supposedly because large-buoyant LDL particles are less likely to penetrate the endothelial gaps, which are about 25 nm in diameter. Apolipoproteins are proteins that bind to fats for their transport in lipoproteins, to be used by various tissues for energy; free fatty acids also need to bind to proteins, notably albumin, to be transported to tissues for use as energy. Redundant particles and processes are everywhere in the human body!
Below is the bottom part of a typical VAP report, providing a risk summary and some basic recommendations. One of the recommendations is “to lower” the LDL target from 130mg/dL to 100mg/dL due to the presence of the checked emerging risk factors on the right, under “Considerations”. What that usually means in practice is a recommendation to take drugs, especially statins, to reduce LDL cholesterol levels. A recent post here and the discussion under it suggest that this would be a highly questionable recommendation in the vast majority of cases ().
What do I think about VAP tests? I think that they are useful in that they provide a lot more information about one’s lipids than standard lipid profiles, and more information is better than less. On the other hand, I think that people should be very careful about what they do with that information. There are even more direct tests that I would recommend before a decision to take drugs is made (, ), if that decision is ever made at all.
At the time of this writing, a typical VAP test report would provide direct measures of the cholesterol content of LDL, Lp(a), IDL, HDL, and VLDL particles. It would also provide additional measures referred to as secondary risk factors, notably particle density patterns and apolipoprotein concentrations. Finally, it would provide a customized risk summary and some basic recommendations for treatment. Below is the top part of a typical VAP test report (from Atherotech), showing measures of the cholesterol content of various particles. LDL cholesterol is combined for four particle subtypes, the small-dense subtypes 4 and 3, and the large-buoyant subtypes 2 and 1. A breakdown by LDL particle subtype is provided later in the VAP report.
In the table above, HDL cholesterol is categorized in two subtypes, the small-dense subtype 2, and the large-buoyant subtype 3. Interestingly, most of the HDL cholesterol in the table is supposedly of the least protective subtype, which seems to be a common finding in the general population. VLDL cholesterol is categorized in a similar way. IDL stands for intermediate-density lipoprotein; this is essentially a VLDL particle that has given off some of its content, particularly its triglyceride (or fat) cargo, but still remains in circulation.
Lp(a) is a special subtype of the LDL particle that is purported to be associated with markedly atherogenic factors. Mainstream medicine generally considers Lp(a) particles themselves to be atherogenic, which is highly debatable. Among other things, cardiovascular disease (CVD) risk and Lp(a) concentration follow a J-curve pattern, and Lp(a)’s range of variation in humans is very large. A blog post by Peter (Hyperlipid) has a figure right at the top that illustrates the former J-curve assertion (). The latter fact, related to range of variation, generally leads to a rather wide normal distribution of Lp(a) concentrations in most populations; meaning that a large number of individuals tend to fall outside Lp(a)’s optimal range and still have a low risk of developing CVD.
Below is the middle part of a typical VAP report, showing secondary risk factors, such as particle density patterns and apolipoprotein concentrations. LDL particle pattern A is considered to be the most protective, supposedly because large-buoyant LDL particles are less likely to penetrate the endothelial gaps, which are about 25 nm in diameter. Apolipoproteins are proteins that bind to fats for their transport in lipoproteins, to be used by various tissues for energy; free fatty acids also need to bind to proteins, notably albumin, to be transported to tissues for use as energy. Redundant particles and processes are everywhere in the human body!
Below is the bottom part of a typical VAP report, providing a risk summary and some basic recommendations. One of the recommendations is “to lower” the LDL target from 130mg/dL to 100mg/dL due to the presence of the checked emerging risk factors on the right, under “Considerations”. What that usually means in practice is a recommendation to take drugs, especially statins, to reduce LDL cholesterol levels. A recent post here and the discussion under it suggest that this would be a highly questionable recommendation in the vast majority of cases ().
What do I think about VAP tests? I think that they are useful in that they provide a lot more information about one’s lipids than standard lipid profiles, and more information is better than less. On the other hand, I think that people should be very careful about what they do with that information. There are even more direct tests that I would recommend before a decision to take drugs is made (, ), if that decision is ever made at all.
Monday, November 28, 2011
Triglycerides, VLDL, and industrial carbohydrate-rich foods
Below are the coefficients of association calculated by HealthCorrelator for Excel (HCE) for user John Doe. The coefficients of association are calculated as linear correlations in HCE (). The focus here is on the associations between fasting triglycerides and various other variables. Take a look at the coefficient of association at the top, with VLDL cholesterol, indicated with a red arrow. It is a very high 0.999.
Whoa! What is this – 0.999! Is John Doe a unique case? No, this strong association between fasting triglycerides and VLDL cholesterol is a very common pattern among HCE users. The reason is simple. VLDL cholesterol is not normally measured directly, but typically calculated based on fasting triglycerides, by dividing the fasting triglycerides measurement by 5. And there is an underlying reason for that - fasting triglycerides and VLDL cholesterol are actually very highly correlated, based on direct measurements of these two variables.
But if VLDL cholesterol is calculated based on fasting triglycerides (VLDL cholesterol = fasting triglycerides / 5), how come the correlation is 0.999, and not a perfect 1? The reason is the rounding error in the measurements. Whenever you see a correlation this high (i.e., 0.999), it is reasonable to suspect that the source is an underlying linear relationship disturbed by rounding error.
Fasting triglycerides are probably the most useful measures on standard lipid panels. For example, fasting triglycerides below 70 mg/dl suggest a pattern of LDL particles that is predominantly of large and buoyant particles. This pattern is associated with a low incidence of cardiovascular disease (). Also, chronically high fasting triglycerides are a well known marker of the metabolic syndrome, and a harbinger of type 2 diabetes.
Where do large and buoyant LDL particles come from? They frequently start as "big" (relatively speaking) blobs of fat, which are actually VLDL particles. The photo is from the excellent book by Elliott & Elliott (); it shows, on the same scale: (a) VLDL particles, (b) chylomicrons, (c) LDL particles, and (d) HDL particles. The dark bar at the bottom of each shot is 1000 A in length, or 100 nm (A = angstrom; nm = nanometer; 1 nm = 10 A).
If you consume an excessive amount of carbohydrates, my theory is that your liver will produce an abnormally large number of small VLDL particles (also shown on the photo above), a proportion of which will end up as small and dense LDL particles. The liver will do that relatively quickly, probably as a short-term compensatory mechanism to avoid glucose toxicity. It will essentially turn excess glucose, from excess carbohydrates, into fat. The VLDL particles carrying that fat in the form of triglycerides will be small because the liver will be in a hurry to clear the excess glucose in circulation, and will have no time to produce large particles, which take longer to produce individually.
This will end up leading to excess triglycerides hanging around in circulation, long after they should have been used as sources of energy. High fasting triglycerides will be a reflection of that. The graphs below, also generated by HCE for John Doe, show how fasting triglycerides and VLDL cholesterol vary in relation to refined carbohydrate consumption. Again, the graphs are not identical in shape because of rounding error; the shapes are almost identical.
Small and dense LDL particles, in the presence of other factors such as systemic inflammation, will contribute to the formation of atherosclerotic plaques. Again, the main source of these particles would be an excessive amount of carbohydrates. What is an excessive amount of carbohydrates? Generally speaking, it is an amount beyond your liver’s capacity to convert the resulting digestion byproducts, fructose and glucose, into liver glycogen. This may come from spaced consumption throughout the day, or acute consumption in an unnatural form (a can of regular coke), or both.
Liver glycogen is sugar stored in the liver. This is the main source of sugar for your brain. If your blood sugar levels become too low, your brain will get angry. Eventually it will go from angry to dead, and you will finally find out what awaits you in the afterlife.
Should you be a healthy athlete who severely depletes liver glycogen stores on a regular basis, you will probably have an above average liver glycogen storage and production capacity. That will be a result of long-term compensatory adaptation to glycogen depleting exercise (). As such, you may be able to consume large amounts of carbohydrates, and you will still not have high fasting triglycerides. You will not carry a lot of body fat either, because the carbohydrates will not be converted to fat and sent into circulation in VLDL particles. They will be used to make liver glycogen.
In fact, if you are a healthy athlete who severely depletes liver glycogen stores on a regular basis, excess calories will be just about the only thing that will contribute to body fat gain. Your threshold for “excess” carbohydrates will be so high that you will feel like the whole low carbohydrate community is not only misguided but also part of a conspiracy against people like you. If you are also an aggressive blog writer, you may feel compelled to tell the world something like this: “Here, I can eat 300 g of carbohydrates per day and maintain single-digit body fat levels! Take that you low carbohydrate idiots!”
Let us say you do not consume an excessive amount of carbohydrates; again, what is excessive or not varies, probably dramatically, from individual to individual. In this case your liver will produce a relatively small number of fat VLDL particles, which will end up as large and buoyant LDL particles. The fat in these large VLDL particles will likely not come primarily from conversion of glucose and/or fructose into fat (i.e., de novo lipogenesis), but from dietary sources of fat.
How do you avoid consuming excess carbohydrates? A good way of achieving that is to avoid man-made carbohydrate-rich foods. Another is adopting a low carbohydrate diet. Yet another is to become a healthy athlete who severely depletes liver glycogen stores on a regular basis; then you can eat a lot of bread, pasta, doughnuts and so on, and keep your fingers crossed for the future.
Either way, fasting triglycerides will be strongly correlated with VLDL cholesterol, because VLDL particles contain both triglycerides (“encapsulated” fat, not to be confused with “free” fatty acids) and cholesterol. If a large number of VLDL particles are produced by one’s liver, the person’s fasting triglycerides reading will be high. If a small number of VLDL particles are produced, even if they are fat particles, the fasting triglycerides reading will be relatively low. Neither VLDL cholesterol nor fasting triglycerides will be zero though.
Now, you may be wondering, how come a small number of fat VLDL particles will eventually lead to low fasting triglycerides? After all, they are fat particles, even though they occur in fewer numbers. My hypothesis is that having a large number of small-dense VLDL particles in circulation is an abnormal, unnatural state, and that our body is not well designed to deal with that state. Use of lipoprotein-bound fat as a source of energy in this state becomes somewhat less efficient, leading to high triglycerides in circulation; and also to hunger, as our mitochondria like fat.
This hypothesis, and the theory outlined above, fit well with the numbers I have been seeing for quite some time from HCE users. Note that it is a bit different from the more popular theory, particularly among low carbohydrate writers, that fat is force-stored in adipocytes (fat cells) by insulin and not released for use as energy, also leading to hunger. What I am saying here, which is compatible with this more popular theory, is that lipoproteins, like adipocytes, also end up holding more fat than they should if you consume excess carbohydrates, and for longer.
Want to improve your health? Consider replacing things like bread and cereal with butter and eggs in your diet (). And also go see you doctor (); if he disagrees with this recommendation, ask him to read this post and explain why he disagrees.
Whoa! What is this – 0.999! Is John Doe a unique case? No, this strong association between fasting triglycerides and VLDL cholesterol is a very common pattern among HCE users. The reason is simple. VLDL cholesterol is not normally measured directly, but typically calculated based on fasting triglycerides, by dividing the fasting triglycerides measurement by 5. And there is an underlying reason for that - fasting triglycerides and VLDL cholesterol are actually very highly correlated, based on direct measurements of these two variables.
But if VLDL cholesterol is calculated based on fasting triglycerides (VLDL cholesterol = fasting triglycerides / 5), how come the correlation is 0.999, and not a perfect 1? The reason is the rounding error in the measurements. Whenever you see a correlation this high (i.e., 0.999), it is reasonable to suspect that the source is an underlying linear relationship disturbed by rounding error.
Fasting triglycerides are probably the most useful measures on standard lipid panels. For example, fasting triglycerides below 70 mg/dl suggest a pattern of LDL particles that is predominantly of large and buoyant particles. This pattern is associated with a low incidence of cardiovascular disease (). Also, chronically high fasting triglycerides are a well known marker of the metabolic syndrome, and a harbinger of type 2 diabetes.
Where do large and buoyant LDL particles come from? They frequently start as "big" (relatively speaking) blobs of fat, which are actually VLDL particles. The photo is from the excellent book by Elliott & Elliott (); it shows, on the same scale: (a) VLDL particles, (b) chylomicrons, (c) LDL particles, and (d) HDL particles. The dark bar at the bottom of each shot is 1000 A in length, or 100 nm (A = angstrom; nm = nanometer; 1 nm = 10 A).
If you consume an excessive amount of carbohydrates, my theory is that your liver will produce an abnormally large number of small VLDL particles (also shown on the photo above), a proportion of which will end up as small and dense LDL particles. The liver will do that relatively quickly, probably as a short-term compensatory mechanism to avoid glucose toxicity. It will essentially turn excess glucose, from excess carbohydrates, into fat. The VLDL particles carrying that fat in the form of triglycerides will be small because the liver will be in a hurry to clear the excess glucose in circulation, and will have no time to produce large particles, which take longer to produce individually.
This will end up leading to excess triglycerides hanging around in circulation, long after they should have been used as sources of energy. High fasting triglycerides will be a reflection of that. The graphs below, also generated by HCE for John Doe, show how fasting triglycerides and VLDL cholesterol vary in relation to refined carbohydrate consumption. Again, the graphs are not identical in shape because of rounding error; the shapes are almost identical.
Small and dense LDL particles, in the presence of other factors such as systemic inflammation, will contribute to the formation of atherosclerotic plaques. Again, the main source of these particles would be an excessive amount of carbohydrates. What is an excessive amount of carbohydrates? Generally speaking, it is an amount beyond your liver’s capacity to convert the resulting digestion byproducts, fructose and glucose, into liver glycogen. This may come from spaced consumption throughout the day, or acute consumption in an unnatural form (a can of regular coke), or both.
Liver glycogen is sugar stored in the liver. This is the main source of sugar for your brain. If your blood sugar levels become too low, your brain will get angry. Eventually it will go from angry to dead, and you will finally find out what awaits you in the afterlife.
Should you be a healthy athlete who severely depletes liver glycogen stores on a regular basis, you will probably have an above average liver glycogen storage and production capacity. That will be a result of long-term compensatory adaptation to glycogen depleting exercise (). As such, you may be able to consume large amounts of carbohydrates, and you will still not have high fasting triglycerides. You will not carry a lot of body fat either, because the carbohydrates will not be converted to fat and sent into circulation in VLDL particles. They will be used to make liver glycogen.
In fact, if you are a healthy athlete who severely depletes liver glycogen stores on a regular basis, excess calories will be just about the only thing that will contribute to body fat gain. Your threshold for “excess” carbohydrates will be so high that you will feel like the whole low carbohydrate community is not only misguided but also part of a conspiracy against people like you. If you are also an aggressive blog writer, you may feel compelled to tell the world something like this: “Here, I can eat 300 g of carbohydrates per day and maintain single-digit body fat levels! Take that you low carbohydrate idiots!”
Let us say you do not consume an excessive amount of carbohydrates; again, what is excessive or not varies, probably dramatically, from individual to individual. In this case your liver will produce a relatively small number of fat VLDL particles, which will end up as large and buoyant LDL particles. The fat in these large VLDL particles will likely not come primarily from conversion of glucose and/or fructose into fat (i.e., de novo lipogenesis), but from dietary sources of fat.
How do you avoid consuming excess carbohydrates? A good way of achieving that is to avoid man-made carbohydrate-rich foods. Another is adopting a low carbohydrate diet. Yet another is to become a healthy athlete who severely depletes liver glycogen stores on a regular basis; then you can eat a lot of bread, pasta, doughnuts and so on, and keep your fingers crossed for the future.
Either way, fasting triglycerides will be strongly correlated with VLDL cholesterol, because VLDL particles contain both triglycerides (“encapsulated” fat, not to be confused with “free” fatty acids) and cholesterol. If a large number of VLDL particles are produced by one’s liver, the person’s fasting triglycerides reading will be high. If a small number of VLDL particles are produced, even if they are fat particles, the fasting triglycerides reading will be relatively low. Neither VLDL cholesterol nor fasting triglycerides will be zero though.
Now, you may be wondering, how come a small number of fat VLDL particles will eventually lead to low fasting triglycerides? After all, they are fat particles, even though they occur in fewer numbers. My hypothesis is that having a large number of small-dense VLDL particles in circulation is an abnormal, unnatural state, and that our body is not well designed to deal with that state. Use of lipoprotein-bound fat as a source of energy in this state becomes somewhat less efficient, leading to high triglycerides in circulation; and also to hunger, as our mitochondria like fat.
This hypothesis, and the theory outlined above, fit well with the numbers I have been seeing for quite some time from HCE users. Note that it is a bit different from the more popular theory, particularly among low carbohydrate writers, that fat is force-stored in adipocytes (fat cells) by insulin and not released for use as energy, also leading to hunger. What I am saying here, which is compatible with this more popular theory, is that lipoproteins, like adipocytes, also end up holding more fat than they should if you consume excess carbohydrates, and for longer.
Want to improve your health? Consider replacing things like bread and cereal with butter and eggs in your diet (). And also go see you doctor (); if he disagrees with this recommendation, ask him to read this post and explain why he disagrees.
Labels:
carbohydrates,
cardiovascular disease,
cholesterol,
HCE,
LDL,
low carb,
statistics,
triglyceride,
VLDL
Monday, September 12, 2011
Fasting blood glucose of 83 mg/dl and heart disease: Fact and fiction
If you are interested in the connection between blood glucose control and heart disease, you have probably done your homework. This is a scary connection, and sometimes the information on the Internetz make people even more scared. You have probably seen something to this effect mentioned:
So I decided to take a look at the Brunner and colleagues study. It covers, among other things, the relationship between cardiovascular disease (they use the acronym CHD for this), and 2-hour blood glucose levels after a 50-g oral glucose tolerance test (OGTT). They tested thousands of men at one point in time, and then followed them for over 30 years, which is really impressive. The graph below shows the relationship between CHD and blood glucose in mmol/l. Here is a calculator to convert the values to mg/dl.
The authors note in the limitations section that: “Fasting glucose was not measured.” So these results have nothing to do with fasting glucose, as we are led to believe when we see this study cited on the web. Also, on the abstract, the authors say that there is “no evidence of nonlinearity”, but in the results section they say that the data provides “evidence of a nonlinear relationship”. The relationship sure looks nonlinear to me. I tried to approximate it manually below.
Note that CHD mortality really goes up more clearly after a glucose level of 5.5 mmol/l (100 mg/dl). But it also varies significantly more widely after that level; the magnitudes of the error bars reflect that. Also, you can see that at around 6.7 mmol/l (121 mg/dl), CHD mortality is on average about the same as at 5.5 mmol/l (100 mg/dl) and 3.5 mmol/l (63 mg/dl). This last level suggests an abnormally high insulin response, bringing blood glucose levels down too much at the 2-hour mark – i.e., reactive hypoglycemia, which the study completely ignores.
These findings are consistent with the somewhat chaotic nature of blood glucose variations in normoglycemic individuals, and also with evidence suggesting that average blood glucose levels go up with age in a J-curve fashion even in long-lived individuals.
We also know that traits vary along a bell curve for any population of individuals. Research results are often reported as averages, but the average individual does not exist. The average individual is an abstraction, and you are not it. Glucose metabolism is a complex trait, which is influenced by many factors. This is why there is so much variation in mortality for different glucose levels, as indicated by the magnitudes of the error bars.
In any event, these findings are clearly inconsistent with the statement that "heart disease risk increases in a linear fashion as fasting blood glucose rises beyond 83 mg/dl". The authors even state early in the article that another study based on the same dataset, to which theirs was a follow-up, suggested that:
Many of the complications from diabetes, including heart disease, stem from poor glucose control. But it seems increasingly clear that blood glucose control does not have to be perfect to keep those complications at bay. For most people, blood glucose levels can be maintained within a certain range with the proper diet and lifestyle. You may be looking at a long life if you catch the problem early, even if your blood glucose is not always at 83 mg/dl (4.6 mmol/l). More on this on my next post.
Heart disease risk increases in a linear fashion as fasting blood glucose rises beyond 83 mg/dl.In fact, I have seen this many times, including on some very respectable blogs. I suspect it started with one blogger, and then got repeated over and over again by others; sometimes things become “true” through repetition. Frequently the reference cited is a study by Brunner and colleagues, published in Diabetes Care in 2006. I doubt very much the bloggers in question actually read this article. Sometimes a study by Coutinho and colleagues is also cited, but this latter study is actually a meta-analysis.
So I decided to take a look at the Brunner and colleagues study. It covers, among other things, the relationship between cardiovascular disease (they use the acronym CHD for this), and 2-hour blood glucose levels after a 50-g oral glucose tolerance test (OGTT). They tested thousands of men at one point in time, and then followed them for over 30 years, which is really impressive. The graph below shows the relationship between CHD and blood glucose in mmol/l. Here is a calculator to convert the values to mg/dl.
The authors note in the limitations section that: “Fasting glucose was not measured.” So these results have nothing to do with fasting glucose, as we are led to believe when we see this study cited on the web. Also, on the abstract, the authors say that there is “no evidence of nonlinearity”, but in the results section they say that the data provides “evidence of a nonlinear relationship”. The relationship sure looks nonlinear to me. I tried to approximate it manually below.
Note that CHD mortality really goes up more clearly after a glucose level of 5.5 mmol/l (100 mg/dl). But it also varies significantly more widely after that level; the magnitudes of the error bars reflect that. Also, you can see that at around 6.7 mmol/l (121 mg/dl), CHD mortality is on average about the same as at 5.5 mmol/l (100 mg/dl) and 3.5 mmol/l (63 mg/dl). This last level suggests an abnormally high insulin response, bringing blood glucose levels down too much at the 2-hour mark – i.e., reactive hypoglycemia, which the study completely ignores.
These findings are consistent with the somewhat chaotic nature of blood glucose variations in normoglycemic individuals, and also with evidence suggesting that average blood glucose levels go up with age in a J-curve fashion even in long-lived individuals.
We also know that traits vary along a bell curve for any population of individuals. Research results are often reported as averages, but the average individual does not exist. The average individual is an abstraction, and you are not it. Glucose metabolism is a complex trait, which is influenced by many factors. This is why there is so much variation in mortality for different glucose levels, as indicated by the magnitudes of the error bars.
In any event, these findings are clearly inconsistent with the statement that "heart disease risk increases in a linear fashion as fasting blood glucose rises beyond 83 mg/dl". The authors even state early in the article that another study based on the same dataset, to which theirs was a follow-up, suggested that:
…. [CHD was associated with levels above] a postload glucose of 5.3 mmol/l [95 mg/dl], but below this level the degree of glycemia was not associated with coronary risk.Now, exaggerating the facts, to the point of creating fictitious results, may have a positive effect. It may scare people enough that they will actually check their blood glucose levels. Perhaps people will remove certain foods like doughnuts and jelly beans from their diets, or at least reduce their consumption dramatically. However, many people may find themselves with higher fasting blood glucose levels, even after removing those foods from their diets, as their bodies try to adapt to lower circulating insulin levels. Some may see higher levels for doing other things that are likely to improve their health in the long term. Others may see higher levels as they get older.
Many of the complications from diabetes, including heart disease, stem from poor glucose control. But it seems increasingly clear that blood glucose control does not have to be perfect to keep those complications at bay. For most people, blood glucose levels can be maintained within a certain range with the proper diet and lifestyle. You may be looking at a long life if you catch the problem early, even if your blood glucose is not always at 83 mg/dl (4.6 mmol/l). More on this on my next post.
Labels:
cardiovascular disease,
diabetes,
glucose,
heart disease,
J curve,
research
Tuesday, September 28, 2010
Income, obesity, and heart disease in US states
The figure below combines data on median income by state (bottom-left and top-right), as well as a plot of heart disease death rates against percentage of population with body mass index (BMI) greater than 30 percent. The data are recent, and have been provided by CNN.com and creativeclass.com, respectively.
Heart disease deaths and obesity are strongly associated with each other, and both are inversely associated with median income. US states with lower median income tend to have generally higher rates of obesity and heart disease deaths.
The reasons are probably many, complex, and closely interconnected. Low income is usually associated with high rates of stress, depression, smoking, alcoholism, and poor nutrition. Compounding the problem, these are normally associated with consumption of cheap, addictive, highly refined foods.
Interestingly, this is primarily an urban phenomenon. If you were to use hunter-gatherers as your data sources, you would probably see the opposite relationship. For example, non-westernized hunter-gatherers have no income (at least not in the “normal” sense), but typically have a lower incidence of obesity and heart disease than mildly westernized ones. The latter have some income.
Tragically, the first few generations of fully westernized hunter-gatherers usually find themselves in the worst possible spot.
Heart disease deaths and obesity are strongly associated with each other, and both are inversely associated with median income. US states with lower median income tend to have generally higher rates of obesity and heart disease deaths.
The reasons are probably many, complex, and closely interconnected. Low income is usually associated with high rates of stress, depression, smoking, alcoholism, and poor nutrition. Compounding the problem, these are normally associated with consumption of cheap, addictive, highly refined foods.
Interestingly, this is primarily an urban phenomenon. If you were to use hunter-gatherers as your data sources, you would probably see the opposite relationship. For example, non-westernized hunter-gatherers have no income (at least not in the “normal” sense), but typically have a lower incidence of obesity and heart disease than mildly westernized ones. The latter have some income.
Tragically, the first few generations of fully westernized hunter-gatherers usually find themselves in the worst possible spot.
Labels:
cardiovascular disease,
heart disease,
income,
obesity,
research
Sunday, September 12, 2010
The China Study II: Wheat flour, rice, and cardiovascular disease
In my last post on the China Study II, I analyzed the effect of total and HDL cholesterol on mortality from all cardiovascular diseases. The main conclusion was that total and HDL cholesterol were protective. Total and HDL cholesterol usually increase with intake of animal foods, and particularly of animal fat. The lowest mortality from all cardiovascular diseases was in the highest total cholesterol range, 172.5 to 180; and the highest mortality in the lowest total cholesterol range, 120 to 127.5. The difference was quite large; the mortality in the lowest range was approximately 3.3 times higher than in the highest.
This post focuses on the intake of two main plant foods, namely wheat flour and rice intake, and their relationships with mortality from all cardiovascular diseases. After many exploratory multivariate analyses, wheat flour and rice emerged as the plant foods with the strongest associations with mortality from all cardiovascular diseases. Moreover, wheat flour and rice have a strong and inverse relationship with each other, which suggests a “consumption divide”. Since the data is from China in the late 1980s, it is likely that consumption of wheat flour is even higher now. As you’ll see, this picture is alarming.
The main model and results
All of the results reported here are from analyses conducted using WarpPLS. Below is the model with the main results of the analyses. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore associations between variables, which are shown within ovals. The meaning of each variable is the following: SexM1F2 = sex, with 1 assigned to males and 2 to females; MVASC = mortality from all cardiovascular diseases (ages 35-69); TKCAL = total calorie intake per day; WHTFLOUR = wheat flour intake (g/day); and RICE = and rice intake (g/day).
The variables to the left of MVASC are the main predictors of interest in the model. The one to the right is a control variable – SexM1F2. The path coefficients (indicated as beta coefficients) reflect the strength of the relationships. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to. The P values indicate the statistical significance of the relationship; a P lower than 0.05 generally means a significant relationship (95 percent or higher likelihood that the relationship is “real”).
In summary, the model above seems to be telling us that:
- As rice intake increases, wheat flour intake decreases significantly (beta=-0.84; P<0.01). This relationship would be the same if the arrow pointed in the opposite direction. It suggests that there is a sharp divide between rice-consuming and wheat flour-consuming regions.
- As wheat flour intake increases, mortality from all cardiovascular diseases increases significantly (beta=0.32; P<0.01). This is after controlling for the effects of rice and total calorie intake. That is, wheat flour seems to have some inherent properties that make it bad for one’s health, even if one doesn’t consume that many calories.
- As rice intake increases, mortality from all cardiovascular diseases decreases significantly (beta=-0.24; P<0.01). This is after controlling for the effects of wheat flour and total calorie intake. That is, this effect is not entirely due to rice being consumed in place of wheat flour. Still, as you’ll see later in this post, this relationship is nonlinear. Excessive rice intake does not seem to be very good for one’s health either.
- Increases in wheat flour and rice intake are significantly associated with increases in total calorie intake (betas=0.25, 0.33; P<0.01). This may be due to wheat flour and rice intake: (a) being themselves, in terms of their own caloric content, main contributors to the total calorie intake; or (b) causing an increase in calorie intake from other sources. The former is more likely, given the effect below.
- The effect of total calorie intake on mortality from all cardiovascular diseases is insignificant when we control for the effects of rice and wheat flour intakes (beta=0.08; P=0.35). This suggests that neither wheat flour nor rice exerts an effect on mortality from all cardiovascular diseases by increasing total calorie intake from other food sources.
- Being female is significantly associated with a reduction in mortality from all cardiovascular diseases (beta=-0.24; P=0.01). This is to be expected. In other words, men are women with a few design flaws, so to speak. (This situation reverses itself a bit after menopause.)
Wheat flour displaces rice
The graph below shows the shape of the association between wheat flour intake (WHTFLOUR) and rice intake (RICE). The values are provided in standardized format; e.g., 0 is the mean (a.k.a. average), 1 is one standard deviation above the mean, and so on. The curve is the best-fitting U curve obtained by the software. It actually has the shape of an exponential decay curve, which can be seen as a section of a U curve. This suggests that wheat flour consumption has strongly displaced rice consumption in several regions in China, and also that wherever rice consumption is high wheat flour consumption tends to be low.
As wheat flour intake goes up, so does cardiovascular disease mortality
The graphs below show the shapes of the association between wheat flour intake (WHTFLOUR) and mortality from all cardiovascular diseases (MVASC). In the first graph, the values are provided in standardized format; e.g., 0 is the mean (or average), 1 is one standard deviation above the mean, and so on. In the second graph, the values are provided in unstandardized format and organized in terciles (each of three equal intervals).
The curve in the first graph is the best-fitting U curve obtained by the software. It is a quasi-linear relationship. The higher the consumption of wheat flour in a county, the higher seems to be the mortality from all cardiovascular diseases. The second graph suggests that mortality in the third tercile, which represents a consumption of wheat flour of 501 to 751 g/day (a lot!), is 69 percent higher than mortality in the first tercile (0 to 251 g/day).
Rice seems to be protective, as long as intake is not too high
The graphs below show the shapes of the association between rice intake (RICE) and mortality from all cardiovascular diseases (MVASC). In the first graph, the values are provided in standardized format. In the second graph, the values are provided in unstandardized format and organized in terciles.
Here the relationship is more complex. The lowest mortality is clearly in the second tercile (206 to 412 g/day). There is a lot of variation in the first tercile, as suggested by the first graph with the U curve. (Remember, as rice intake goes down, wheat flour intake tends to go up.) The U curve here looks similar to the exponential decay curve shown earlier in the post, for the relationship between rice and wheat flour intake.
In fact, the shape of the association between rice intake and mortality from all cardiovascular diseases looks a bit like an “echo” of the shape of the relationship between rice and wheat flour intake. Here is what is creepy. This echo looks somewhat like the first curve (between rice and wheat flour intake), but with wheat flour intake replaced by “death” (i.e., mortality from all cardiovascular diseases).
What does this all mean?
- Wheat flour displacing rice does not look like a good thing. Wheat flour intake seems to have strongly displaced rice intake in the counties where it is heavily consumed. Generally speaking, that does not seem to have been a good thing. It looks like this is generally associated with increased mortality from all cardiovascular diseases.
- High glycemic index food consumption does not seem to be the problem here. Wheat flour and rice have very similar glycemic indices (but generally not glycemic loads; see below). Both lead to blood glucose and insulin spikes. Yet, rice consumption seems protective when it is not excessive. This is true in part (but not entirely) because it largely displaces wheat flour. Moreover, neither rice nor wheat flour consumption seems to be significantly associated with cardiovascular disease via an increase in total calorie consumption. This is a bit of a blow to the theory that high glycemic carbohydrates necessarily cause obesity, diabetes, and eventually cardiovascular disease.
- The problem with wheat flour is … hard to pinpoint, based on the results summarized here. Maybe it is the fact that it is an ultra-refined carbohydrate-rich food; less refined forms of wheat could be healthier. In fact, the glycemic loads of less refined carbohydrate-rich foods tend to be much lower than those of more refined ones. (Also, boiled brown rice has a glycemic load that is about three times lower than that of whole wheat bread; whereas the glycemic indices are about the same.) Maybe the problem is wheat flour's gluten content. Maybe it is a combination of various factors, including these.
Reference
Kock, N. (2010). WarpPLS 1.0 User Manual. Laredo, Texas: ScriptWarp Systems.
Acknowledgment and notes
- Many thanks are due to Dr. Campbell and his collaborators for collecting and compiling the data used in this analysis. The data is from this site, created by those researchers to disseminate their work in connection with a study often referred to as the “China Study II”. It has already been analyzed by other bloggers. Notable analyses have been conducted by Ricardo at Canibais e Reis, Stan at Heretic, and Denise at Raw Food SOS.
- The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable). Whenever nonlinear relationships were modeled, the path coefficients were automatically corrected by the software to account for nonlinearity.
- The software used here identifies non-cyclical and mono-cyclical relationships such as logarithmic, exponential, and hyperbolic decay relationships. Once a relationship is identified, data values are corrected and coefficients calculated. This is not the same as log-transforming data prior to analysis, which is widely used but only works if the underlying relationship is logarithmic. Otherwise, log-transforming data may distort the relationship even more than assuming that it is linear, which is what is done by most statistical software tools.
- The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. In complex and multi-factorial phenomena such as health-related phenomena, many would consider an R-squared of 0.20 as acceptable. Still, such an R-squared would mean that 80 percent of the variance for a particularly variable is unexplained by the data.
- The P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This and other related techniques also tend to yield more reliable results for small samples, and samples with outliers (as long as the outliers are “good” data, and are not the result of measurement error).
- Only two data points per county were used (for males and females). This increased the sample size of the dataset without artificially reducing variance, which is desirable since the dataset is relatively small. This also allowed for the test of commonsense assumptions (e.g., the protective effects of being female), which is always a good idea in a complex analysis because violation of commonsense assumptions may suggest data collection or analysis error. On the other hand, it required the inclusion of a sex variable as a control variable in the analysis, which is no big deal.
- Since all the data was collected around the same time (late 1980s), this analysis assumes a somewhat static pattern of consumption of rice and wheat flour. In other words, let us assume that variations in consumption of a particular food do lead to variations in mortality. Still, that effect will typically take years to manifest itself. This is a major limitation of this dataset and any related analyses.
- Mortality from schistosomiasis infection (MSCHIST) does not confound the results presented here. Only counties where no deaths from schistosomiasis infection were reported have been included in this analysis. Mortality from all cardiovascular diseases (MVASC) was measured using the variable M059 ALLVASCc (ages 35-69). See this post for other notes that apply here as well.
This post focuses on the intake of two main plant foods, namely wheat flour and rice intake, and their relationships with mortality from all cardiovascular diseases. After many exploratory multivariate analyses, wheat flour and rice emerged as the plant foods with the strongest associations with mortality from all cardiovascular diseases. Moreover, wheat flour and rice have a strong and inverse relationship with each other, which suggests a “consumption divide”. Since the data is from China in the late 1980s, it is likely that consumption of wheat flour is even higher now. As you’ll see, this picture is alarming.
The main model and results
All of the results reported here are from analyses conducted using WarpPLS. Below is the model with the main results of the analyses. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore associations between variables, which are shown within ovals. The meaning of each variable is the following: SexM1F2 = sex, with 1 assigned to males and 2 to females; MVASC = mortality from all cardiovascular diseases (ages 35-69); TKCAL = total calorie intake per day; WHTFLOUR = wheat flour intake (g/day); and RICE = and rice intake (g/day).
The variables to the left of MVASC are the main predictors of interest in the model. The one to the right is a control variable – SexM1F2. The path coefficients (indicated as beta coefficients) reflect the strength of the relationships. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to. The P values indicate the statistical significance of the relationship; a P lower than 0.05 generally means a significant relationship (95 percent or higher likelihood that the relationship is “real”).
In summary, the model above seems to be telling us that:
- As rice intake increases, wheat flour intake decreases significantly (beta=-0.84; P<0.01). This relationship would be the same if the arrow pointed in the opposite direction. It suggests that there is a sharp divide between rice-consuming and wheat flour-consuming regions.
- As wheat flour intake increases, mortality from all cardiovascular diseases increases significantly (beta=0.32; P<0.01). This is after controlling for the effects of rice and total calorie intake. That is, wheat flour seems to have some inherent properties that make it bad for one’s health, even if one doesn’t consume that many calories.
- As rice intake increases, mortality from all cardiovascular diseases decreases significantly (beta=-0.24; P<0.01). This is after controlling for the effects of wheat flour and total calorie intake. That is, this effect is not entirely due to rice being consumed in place of wheat flour. Still, as you’ll see later in this post, this relationship is nonlinear. Excessive rice intake does not seem to be very good for one’s health either.
- Increases in wheat flour and rice intake are significantly associated with increases in total calorie intake (betas=0.25, 0.33; P<0.01). This may be due to wheat flour and rice intake: (a) being themselves, in terms of their own caloric content, main contributors to the total calorie intake; or (b) causing an increase in calorie intake from other sources. The former is more likely, given the effect below.
- The effect of total calorie intake on mortality from all cardiovascular diseases is insignificant when we control for the effects of rice and wheat flour intakes (beta=0.08; P=0.35). This suggests that neither wheat flour nor rice exerts an effect on mortality from all cardiovascular diseases by increasing total calorie intake from other food sources.
- Being female is significantly associated with a reduction in mortality from all cardiovascular diseases (beta=-0.24; P=0.01). This is to be expected. In other words, men are women with a few design flaws, so to speak. (This situation reverses itself a bit after menopause.)
Wheat flour displaces rice
The graph below shows the shape of the association between wheat flour intake (WHTFLOUR) and rice intake (RICE). The values are provided in standardized format; e.g., 0 is the mean (a.k.a. average), 1 is one standard deviation above the mean, and so on. The curve is the best-fitting U curve obtained by the software. It actually has the shape of an exponential decay curve, which can be seen as a section of a U curve. This suggests that wheat flour consumption has strongly displaced rice consumption in several regions in China, and also that wherever rice consumption is high wheat flour consumption tends to be low.
As wheat flour intake goes up, so does cardiovascular disease mortality
The graphs below show the shapes of the association between wheat flour intake (WHTFLOUR) and mortality from all cardiovascular diseases (MVASC). In the first graph, the values are provided in standardized format; e.g., 0 is the mean (or average), 1 is one standard deviation above the mean, and so on. In the second graph, the values are provided in unstandardized format and organized in terciles (each of three equal intervals).
The curve in the first graph is the best-fitting U curve obtained by the software. It is a quasi-linear relationship. The higher the consumption of wheat flour in a county, the higher seems to be the mortality from all cardiovascular diseases. The second graph suggests that mortality in the third tercile, which represents a consumption of wheat flour of 501 to 751 g/day (a lot!), is 69 percent higher than mortality in the first tercile (0 to 251 g/day).
Rice seems to be protective, as long as intake is not too high
The graphs below show the shapes of the association between rice intake (RICE) and mortality from all cardiovascular diseases (MVASC). In the first graph, the values are provided in standardized format. In the second graph, the values are provided in unstandardized format and organized in terciles.
Here the relationship is more complex. The lowest mortality is clearly in the second tercile (206 to 412 g/day). There is a lot of variation in the first tercile, as suggested by the first graph with the U curve. (Remember, as rice intake goes down, wheat flour intake tends to go up.) The U curve here looks similar to the exponential decay curve shown earlier in the post, for the relationship between rice and wheat flour intake.
In fact, the shape of the association between rice intake and mortality from all cardiovascular diseases looks a bit like an “echo” of the shape of the relationship between rice and wheat flour intake. Here is what is creepy. This echo looks somewhat like the first curve (between rice and wheat flour intake), but with wheat flour intake replaced by “death” (i.e., mortality from all cardiovascular diseases).
What does this all mean?
- Wheat flour displacing rice does not look like a good thing. Wheat flour intake seems to have strongly displaced rice intake in the counties where it is heavily consumed. Generally speaking, that does not seem to have been a good thing. It looks like this is generally associated with increased mortality from all cardiovascular diseases.
- High glycemic index food consumption does not seem to be the problem here. Wheat flour and rice have very similar glycemic indices (but generally not glycemic loads; see below). Both lead to blood glucose and insulin spikes. Yet, rice consumption seems protective when it is not excessive. This is true in part (but not entirely) because it largely displaces wheat flour. Moreover, neither rice nor wheat flour consumption seems to be significantly associated with cardiovascular disease via an increase in total calorie consumption. This is a bit of a blow to the theory that high glycemic carbohydrates necessarily cause obesity, diabetes, and eventually cardiovascular disease.
- The problem with wheat flour is … hard to pinpoint, based on the results summarized here. Maybe it is the fact that it is an ultra-refined carbohydrate-rich food; less refined forms of wheat could be healthier. In fact, the glycemic loads of less refined carbohydrate-rich foods tend to be much lower than those of more refined ones. (Also, boiled brown rice has a glycemic load that is about three times lower than that of whole wheat bread; whereas the glycemic indices are about the same.) Maybe the problem is wheat flour's gluten content. Maybe it is a combination of various factors, including these.
Reference
Kock, N. (2010). WarpPLS 1.0 User Manual. Laredo, Texas: ScriptWarp Systems.
Acknowledgment and notes
- Many thanks are due to Dr. Campbell and his collaborators for collecting and compiling the data used in this analysis. The data is from this site, created by those researchers to disseminate their work in connection with a study often referred to as the “China Study II”. It has already been analyzed by other bloggers. Notable analyses have been conducted by Ricardo at Canibais e Reis, Stan at Heretic, and Denise at Raw Food SOS.
- The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable). Whenever nonlinear relationships were modeled, the path coefficients were automatically corrected by the software to account for nonlinearity.
- The software used here identifies non-cyclical and mono-cyclical relationships such as logarithmic, exponential, and hyperbolic decay relationships. Once a relationship is identified, data values are corrected and coefficients calculated. This is not the same as log-transforming data prior to analysis, which is widely used but only works if the underlying relationship is logarithmic. Otherwise, log-transforming data may distort the relationship even more than assuming that it is linear, which is what is done by most statistical software tools.
- The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. In complex and multi-factorial phenomena such as health-related phenomena, many would consider an R-squared of 0.20 as acceptable. Still, such an R-squared would mean that 80 percent of the variance for a particularly variable is unexplained by the data.
- The P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This and other related techniques also tend to yield more reliable results for small samples, and samples with outliers (as long as the outliers are “good” data, and are not the result of measurement error).
- Only two data points per county were used (for males and females). This increased the sample size of the dataset without artificially reducing variance, which is desirable since the dataset is relatively small. This also allowed for the test of commonsense assumptions (e.g., the protective effects of being female), which is always a good idea in a complex analysis because violation of commonsense assumptions may suggest data collection or analysis error. On the other hand, it required the inclusion of a sex variable as a control variable in the analysis, which is no big deal.
- Since all the data was collected around the same time (late 1980s), this analysis assumes a somewhat static pattern of consumption of rice and wheat flour. In other words, let us assume that variations in consumption of a particular food do lead to variations in mortality. Still, that effect will typically take years to manifest itself. This is a major limitation of this dataset and any related analyses.
- Mortality from schistosomiasis infection (MSCHIST) does not confound the results presented here. Only counties where no deaths from schistosomiasis infection were reported have been included in this analysis. Mortality from all cardiovascular diseases (MVASC) was measured using the variable M059 ALLVASCc (ages 35-69). See this post for other notes that apply here as well.
Labels:
cardiovascular disease,
China Study,
multivariate analysis,
research,
rice,
statistics,
warppls,
wheat
Wednesday, September 8, 2010
The China Study II: Cholesterol seems to protect against cardiovascular disease
First of all, many thanks are due to Dr. Campbell and his collaborators for collecting and compiling the data used in this analysis. This data is from this site, created by those researchers to disseminate the data from a study often referred to as the “China Study II”. It has already been analyzed by other bloggers. Notable analyses have been conducted by Ricardo at Canibais e Reis, Stan at Heretic, and Denise at Raw Food SOS.
The analyses in this post differ from those other analyses in various aspects. One of them is that data for males and females were used separately for each county, instead of the totals per county. Only two data points per county were used (for males and females). This increased the sample size of the dataset without artificially reducing variance (for more details, see “Notes” at the end of the post), which is desirable since the dataset is relatively small. This also allowed for the test of commonsense assumptions (e.g., the protective effects of being female), which is always a good idea in a complex analysis because violation of commonsense assumption may suggest data collection or analysis error. On the other hand, it required the inclusion of a sex variable as a control variable in the analysis, which is no big deal.
The analysis was conducted using WarpPLS. Below is the model with the main results of the analysis. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore associations between variables, which are shown within ovals. The meaning of each variable is the following: SexM1F2 = sex, with 1 assigned to males and 2 to females; HDLCHOL = HDL cholesterol; TOTCHOL = total cholesterol; MSCHIST = mortality from schistosomiasis infection; and MVASC = mortality from all cardiovascular diseases.
The variables to the left of MVASC are the main predictors of interest in the model – HDLCHOL and TOTCHOL. The ones to the right are control variables – SexM1F2 and MSCHIST. The path coefficients (indicated as beta coefficients) reflect the strength of the relationships. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to. The P values indicate the statistical significance of the relationship; a P lower than 0.05 generally means a significant relationship (95 percent or higher likelihood that the relationship is “real”).
In summary, this is what the model above is telling us:
- As HDL cholesterol increases, total cholesterol increases significantly (beta=0.48; P<0.01). This is to be expected, as HDL is a main component of total cholesterol, together with VLDL and LDL cholesterol.
- As total cholesterol increases, mortality from all cardiovascular diseases decreases significantly (beta=-0.25; P<0.01). This is to be expected if we assume that total cholesterol is in part an intervening variable between HDL cholesterol and mortality from all cardiovascular diseases. This assumption can be tested through a separate model (more below). Also, there is more to this story, as noted below.
- The effect of HDL cholesterol on mortality from all cardiovascular diseases is insignificant when we control for the effect of total cholesterol (beta=-0.08; P=0.26). This suggests that HDL’s protective role is subsumed by the variable total cholesterol, and also that it is possible that there is something else associated with total cholesterol that makes it protective. Otherwise the effect of total cholesterol might have been insignificant, and the effect of HDL cholesterol significant (the reverse of what we see here).
- Being female is significantly associated with a reduction in mortality from all cardiovascular diseases (beta=-0.16; P=0.01). This is to be expected. In other words, men are women with a few design flaws. (This situation reverses itself a bit after menopause.)
- Mortality from schistosomiasis infection is significantly and inversely associated with mortality from all cardiovascular diseases (beta=-0.28; P<0.01). This is probably due to those dying from schistosomiasis infection not being entered in the dataset as dying from cardiovascular diseases, and vice-versa.
Two other main components of total cholesterol, in addition to HDL cholesterol, are VLDL and LDL cholesterol. These are carried in particles, known as lipoproteins. VLDL cholesterol is usually represented as a fraction of triglycerides in cholesterol equations (e.g., the Friedewald and Iranian equations). It usually correlates inversely with HDL; that is, as HDL cholesterol increases, usually VLDL cholesterol decreases. Given this and the associations discussed above, it seems that LDL cholesterol is a good candidate for the possible “something else associated with total cholesterol that makes it protective”. But waidaminet! Is it possible that the demon particle, the LDL, serves any purpose other than giving us heart attacks?
The graph below shows the shape of the association between total cholesterol (TOTCHOL) and mortality from all cardiovascular diseases (MVASC). The values are provided in standardized format; e.g., 0 is the average, 1 is one standard deviation above the mean, and so on. The curve is the best-fitting S curve obtained by the software (an S curve is a slightly more complex curve than a U curve).
The graph below shows some of the data in unstandardized format, and organized differently. The data is grouped here in ranges of total cholesterol, which are shown on the horizontal axis. The lowest and highest ranges in the dataset are shown, to highlight the magnitude of the apparently protective effect. Here the two variables used to calculate mortality from all cardiovascular diseases (MVASC; see “Notes” at the end of this post) were added. Clearly the lowest mortality from all cardiovascular diseases is in the highest total cholesterol range, 172.5 to 180; and the highest mortality in the lowest total cholesterol range, 120 to 127.5. The difference is quite large; the mortality in the lowest range is approximately 3.3 times higher than in the highest.
The shape of the S-curve graph above suggests that there are other variables that are confounding the results a bit. Mortality from all cardiovascular diseases does seem to generally go down with increases in total cholesterol, but the smooth inflection point at the middle of the S-curve graph suggests a more complex variation pattern that may be influenced by other variables (e.g., smoking, dietary patterns, or even schistosomiasis infection; see “Notes” at the end of this post).
As mentioned before, total cholesterol is strongly influenced by HDL cholesterol, so below is the model with only HDL cholesterol (HDLCHOL) pointing at mortality from all cardiovascular diseases (MVASC), and the control variable sex (SexM1F2).
The graph above confirms the assumption that HDL’s protective role is subsumed by the variable total cholesterol. When the variable total cholesterol is removed from the model, as it was done above, the protective effect of HDL cholesterol becomes significant (beta=-0.27; P<0.01). The control variable sex (SexM1F2) was retained even in this targeted HDL effect model because of the expected confounding effect of sex; females generally tend to have higher HDL cholesterol and less cardiovascular disease than males.
Below, in the “Notes” section (after the “Reference”) are several notes, some of which are quite technical. Providing them separately hopefully has made the discussion above a bit easier to follow. The notes also point at some limitations of the analysis. This data needs to be analyzed from different angles, using multiple models, so that firmer conclusions can be reached. Still, the overall picture that seems to be emerging is at odds with previous beliefs based on the same dataset.
What could be increasing the apparently protective HDL and total cholesterol in this dataset? High consumption of animal foods, particularly foods rich in saturated fat and cholesterol, are strong candidates. Low consumption of vegetable oils rich in linoleic acid, and of foods rich in refined carbohydrates, are also good candidates. Maybe it is a combination of these.
We need more analyses!
Reference:
Kock, N. (2010). WarpPLS 1.0 User Manual. Laredo, Texas: ScriptWarp Systems.
Notes:
- The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable).
- The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. In complex and multi-factorial phenomena such as health-related phenomena, many would consider an R-squared of 0.20 as acceptable. Still, such an R-squared would mean that 80 percent of the variance for a particularly variable is unexplained by the data.
- The P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This and other related techniques also tend to yield more reliable results for small samples, and samples with outliers (as long as the outliers are “good” data, and are not the result of measurement error).
- Colinearity is an important consideration in models that analyze the effect of multiple predictors on one single variable. This is particularly true for multiple regression models, where there is a temptation of adding many predictors to the model to see which ones come out as the “winners”. This often backfires, as colinearity can severely distort the results. Some multiple regression techniques, such as automated stepwise regression with backward elimination, are particularly vulnerable to this problem. Colinearity is not the same as correlation, and thus is defined and measured differently. Two predictor variables may be significantly correlated and still have low colinearity. A reasonably reliable measure of colinearity is the variance inflation factor. Colinearity was tested in this model, and was found to be low.
- An effort was made here to avoid multiple data points per county (even though this was available for some variables), because this could artificially reduce the variance for each variable, and potentially bias the results. The reason for this is that multiple answers from a single county would normally be somewhat correlated; a higher degree of intra-county correlation than inter-county correlation. The resulting bias would be difficult to control for, via one or more control variables. With only two data points per county, one for males and the other for females, one can control for intra-country correlation by adding a “dummy” sex variable to the analysis, as a control variable. This was done here.
- Mortality from schistosomiasis infection (MSCHIST) is a variable that tends to affect the results in a way that makes it more difficult to make sense of them. Generally this is true for any infectious diseases that significantly affect a population under study. The problem with infection is that people with otherwise good health or habits may get the infection, and people with bad health and habits may not. Since cholesterol is used by the human body to fight disease, it may go up, giving the impression that it is going up for some other reason. Perhaps instead of controlling for its effect, as done here, it would have been better to remove from the analysis those counties with deaths from schistosomiasis infection. (See also this post, and this one.)
- Different parts of the data were collected at different times. It seems that the mortality data is for the period 1986-88, and the rest of the data is for 1989. This may have biased the results somewhat, even though the time lag is not that long, especially if there were changes in certain health trends from one period to the other. For example, major migrations from one county to another could have significantly affected the results.
- The following measures were used, from this online dataset like the other measures. P002 HDLCHOL, for HDLCHOL; P001 TOTCHOL, for TOTCHOL; and M021 SCHISTOc, for MSCHIST.
- SexM1F2 is a “dummy” variable that was coded with 1 assigned to males and 2 to females. As such, it essentially measures the “degree of femaleness” of the respondents. Being female is generally protective against cardiovascular disease, a situation that reverts itself a bit after menopause.
- MVASC is a composite measure of the two following variables, provided as component measures of mortality from all cardiovascular diseases: M058 ALLVASCb (ages 0-34), and M059 ALLVASCc (ages 35-69). A couple of obvious problems: (a) they does not include data on people older than 69; and (b) they seem to capture a lot of diseases, including some that do not seem like typical cardiovascular diseases. A factor analysis was conducted, and the loadings and cross-loadings suggested good validity. Composite reliability was also good. So essentially MVASC is measured here as a “latent variable” with two “indicators”. Why do this? The reason is that it reduces the biasing effects of incomplete data and measurement error (e.g., exclusion of folks older than 69). By the way, there is always some measurement error in any dataset.
- This note is related to measurement error in connection with the indicators for MVASC. There is something odd about the variables M058 ALLVASCb (ages 0-34), and M059 ALLVASCc (ages 35-69). According to the dataset, mortality from cardiovascular diseases for ages 0-34 is typically higher than for 35-69, for many counties. Given the good validity and reliability for MVASC as a latent variable, it is possible that the values for these two indicator variables were simply swapped by mistake.
The analyses in this post differ from those other analyses in various aspects. One of them is that data for males and females were used separately for each county, instead of the totals per county. Only two data points per county were used (for males and females). This increased the sample size of the dataset without artificially reducing variance (for more details, see “Notes” at the end of the post), which is desirable since the dataset is relatively small. This also allowed for the test of commonsense assumptions (e.g., the protective effects of being female), which is always a good idea in a complex analysis because violation of commonsense assumption may suggest data collection or analysis error. On the other hand, it required the inclusion of a sex variable as a control variable in the analysis, which is no big deal.
The analysis was conducted using WarpPLS. Below is the model with the main results of the analysis. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore associations between variables, which are shown within ovals. The meaning of each variable is the following: SexM1F2 = sex, with 1 assigned to males and 2 to females; HDLCHOL = HDL cholesterol; TOTCHOL = total cholesterol; MSCHIST = mortality from schistosomiasis infection; and MVASC = mortality from all cardiovascular diseases.
The variables to the left of MVASC are the main predictors of interest in the model – HDLCHOL and TOTCHOL. The ones to the right are control variables – SexM1F2 and MSCHIST. The path coefficients (indicated as beta coefficients) reflect the strength of the relationships. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to. The P values indicate the statistical significance of the relationship; a P lower than 0.05 generally means a significant relationship (95 percent or higher likelihood that the relationship is “real”).
In summary, this is what the model above is telling us:
- As HDL cholesterol increases, total cholesterol increases significantly (beta=0.48; P<0.01). This is to be expected, as HDL is a main component of total cholesterol, together with VLDL and LDL cholesterol.
- As total cholesterol increases, mortality from all cardiovascular diseases decreases significantly (beta=-0.25; P<0.01). This is to be expected if we assume that total cholesterol is in part an intervening variable between HDL cholesterol and mortality from all cardiovascular diseases. This assumption can be tested through a separate model (more below). Also, there is more to this story, as noted below.
- The effect of HDL cholesterol on mortality from all cardiovascular diseases is insignificant when we control for the effect of total cholesterol (beta=-0.08; P=0.26). This suggests that HDL’s protective role is subsumed by the variable total cholesterol, and also that it is possible that there is something else associated with total cholesterol that makes it protective. Otherwise the effect of total cholesterol might have been insignificant, and the effect of HDL cholesterol significant (the reverse of what we see here).
- Being female is significantly associated with a reduction in mortality from all cardiovascular diseases (beta=-0.16; P=0.01). This is to be expected. In other words, men are women with a few design flaws. (This situation reverses itself a bit after menopause.)
- Mortality from schistosomiasis infection is significantly and inversely associated with mortality from all cardiovascular diseases (beta=-0.28; P<0.01). This is probably due to those dying from schistosomiasis infection not being entered in the dataset as dying from cardiovascular diseases, and vice-versa.
Two other main components of total cholesterol, in addition to HDL cholesterol, are VLDL and LDL cholesterol. These are carried in particles, known as lipoproteins. VLDL cholesterol is usually represented as a fraction of triglycerides in cholesterol equations (e.g., the Friedewald and Iranian equations). It usually correlates inversely with HDL; that is, as HDL cholesterol increases, usually VLDL cholesterol decreases. Given this and the associations discussed above, it seems that LDL cholesterol is a good candidate for the possible “something else associated with total cholesterol that makes it protective”. But waidaminet! Is it possible that the demon particle, the LDL, serves any purpose other than giving us heart attacks?
The graph below shows the shape of the association between total cholesterol (TOTCHOL) and mortality from all cardiovascular diseases (MVASC). The values are provided in standardized format; e.g., 0 is the average, 1 is one standard deviation above the mean, and so on. The curve is the best-fitting S curve obtained by the software (an S curve is a slightly more complex curve than a U curve).
The graph below shows some of the data in unstandardized format, and organized differently. The data is grouped here in ranges of total cholesterol, which are shown on the horizontal axis. The lowest and highest ranges in the dataset are shown, to highlight the magnitude of the apparently protective effect. Here the two variables used to calculate mortality from all cardiovascular diseases (MVASC; see “Notes” at the end of this post) were added. Clearly the lowest mortality from all cardiovascular diseases is in the highest total cholesterol range, 172.5 to 180; and the highest mortality in the lowest total cholesterol range, 120 to 127.5. The difference is quite large; the mortality in the lowest range is approximately 3.3 times higher than in the highest.
The shape of the S-curve graph above suggests that there are other variables that are confounding the results a bit. Mortality from all cardiovascular diseases does seem to generally go down with increases in total cholesterol, but the smooth inflection point at the middle of the S-curve graph suggests a more complex variation pattern that may be influenced by other variables (e.g., smoking, dietary patterns, or even schistosomiasis infection; see “Notes” at the end of this post).
As mentioned before, total cholesterol is strongly influenced by HDL cholesterol, so below is the model with only HDL cholesterol (HDLCHOL) pointing at mortality from all cardiovascular diseases (MVASC), and the control variable sex (SexM1F2).
The graph above confirms the assumption that HDL’s protective role is subsumed by the variable total cholesterol. When the variable total cholesterol is removed from the model, as it was done above, the protective effect of HDL cholesterol becomes significant (beta=-0.27; P<0.01). The control variable sex (SexM1F2) was retained even in this targeted HDL effect model because of the expected confounding effect of sex; females generally tend to have higher HDL cholesterol and less cardiovascular disease than males.
Below, in the “Notes” section (after the “Reference”) are several notes, some of which are quite technical. Providing them separately hopefully has made the discussion above a bit easier to follow. The notes also point at some limitations of the analysis. This data needs to be analyzed from different angles, using multiple models, so that firmer conclusions can be reached. Still, the overall picture that seems to be emerging is at odds with previous beliefs based on the same dataset.
What could be increasing the apparently protective HDL and total cholesterol in this dataset? High consumption of animal foods, particularly foods rich in saturated fat and cholesterol, are strong candidates. Low consumption of vegetable oils rich in linoleic acid, and of foods rich in refined carbohydrates, are also good candidates. Maybe it is a combination of these.
We need more analyses!
Reference:
Kock, N. (2010). WarpPLS 1.0 User Manual. Laredo, Texas: ScriptWarp Systems.
Notes:
- The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable).
- The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. In complex and multi-factorial phenomena such as health-related phenomena, many would consider an R-squared of 0.20 as acceptable. Still, such an R-squared would mean that 80 percent of the variance for a particularly variable is unexplained by the data.
- The P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This and other related techniques also tend to yield more reliable results for small samples, and samples with outliers (as long as the outliers are “good” data, and are not the result of measurement error).
- Colinearity is an important consideration in models that analyze the effect of multiple predictors on one single variable. This is particularly true for multiple regression models, where there is a temptation of adding many predictors to the model to see which ones come out as the “winners”. This often backfires, as colinearity can severely distort the results. Some multiple regression techniques, such as automated stepwise regression with backward elimination, are particularly vulnerable to this problem. Colinearity is not the same as correlation, and thus is defined and measured differently. Two predictor variables may be significantly correlated and still have low colinearity. A reasonably reliable measure of colinearity is the variance inflation factor. Colinearity was tested in this model, and was found to be low.
- An effort was made here to avoid multiple data points per county (even though this was available for some variables), because this could artificially reduce the variance for each variable, and potentially bias the results. The reason for this is that multiple answers from a single county would normally be somewhat correlated; a higher degree of intra-county correlation than inter-county correlation. The resulting bias would be difficult to control for, via one or more control variables. With only two data points per county, one for males and the other for females, one can control for intra-country correlation by adding a “dummy” sex variable to the analysis, as a control variable. This was done here.
- Mortality from schistosomiasis infection (MSCHIST) is a variable that tends to affect the results in a way that makes it more difficult to make sense of them. Generally this is true for any infectious diseases that significantly affect a population under study. The problem with infection is that people with otherwise good health or habits may get the infection, and people with bad health and habits may not. Since cholesterol is used by the human body to fight disease, it may go up, giving the impression that it is going up for some other reason. Perhaps instead of controlling for its effect, as done here, it would have been better to remove from the analysis those counties with deaths from schistosomiasis infection. (See also this post, and this one.)
- Different parts of the data were collected at different times. It seems that the mortality data is for the period 1986-88, and the rest of the data is for 1989. This may have biased the results somewhat, even though the time lag is not that long, especially if there were changes in certain health trends from one period to the other. For example, major migrations from one county to another could have significantly affected the results.
- The following measures were used, from this online dataset like the other measures. P002 HDLCHOL, for HDLCHOL; P001 TOTCHOL, for TOTCHOL; and M021 SCHISTOc, for MSCHIST.
- SexM1F2 is a “dummy” variable that was coded with 1 assigned to males and 2 to females. As such, it essentially measures the “degree of femaleness” of the respondents. Being female is generally protective against cardiovascular disease, a situation that reverts itself a bit after menopause.
- MVASC is a composite measure of the two following variables, provided as component measures of mortality from all cardiovascular diseases: M058 ALLVASCb (ages 0-34), and M059 ALLVASCc (ages 35-69). A couple of obvious problems: (a) they does not include data on people older than 69; and (b) they seem to capture a lot of diseases, including some that do not seem like typical cardiovascular diseases. A factor analysis was conducted, and the loadings and cross-loadings suggested good validity. Composite reliability was also good. So essentially MVASC is measured here as a “latent variable” with two “indicators”. Why do this? The reason is that it reduces the biasing effects of incomplete data and measurement error (e.g., exclusion of folks older than 69). By the way, there is always some measurement error in any dataset.
- This note is related to measurement error in connection with the indicators for MVASC. There is something odd about the variables M058 ALLVASCb (ages 0-34), and M059 ALLVASCc (ages 35-69). According to the dataset, mortality from cardiovascular diseases for ages 0-34 is typically higher than for 35-69, for many counties. Given the good validity and reliability for MVASC as a latent variable, it is possible that the values for these two indicator variables were simply swapped by mistake.
Labels:
cardiovascular disease,
China Study,
cholesterol,
HDL,
LDL,
multivariate analysis,
research,
statistics,
VLDL,
warppls
Subscribe to:
Posts (Atom)