I have been a long time user of Fedora at work and have been happy quite happy about it. Around 6 months ago, I moved to Manjaro under VirtualBox in a Windows host, because the company imposes the use of a VPN client that does not run on Linux. It’s much less great for several reasons:
VirtualBox makes everything graphics much slower. It is still way better than WSL2 as I found WSL2 to be a poor man’s Linux, where graphics is via some half broken X11 client for Windows. With VirtualBox, programming is ok, watching videos is not (you need to use the host for that). Something relatively neat is that the VM can be run on any other host, including a Linux host by just copying the vm file.
I picked Manjaro out of curiosity and because it was perhaps faster/more optimized than Fedora. I don’t have too many issues with it. But I found out that the use of AUR packages would frequently break on upgrades. Futhermore, it is updated less frequently than Fedora, which is a bit surprising for a rolling release. I also installed it with ext4. With hindsight, I think both choices were poor, Fedora with brtfs is the way to go, or some Ubuntu variant.
Work also imposes various antivirus programs which tend to always keep the cpu busy.
Overall I am still much more productive than with a raw Windows, which is surprisingly slow for many software development related tasks. Having to deal more with Windows showed me how slow the system is, and how poor the experience is. Before using it again (after many years completely off it), I had started thinking it was good (well, it does crash much less than back in the days).
I otherwise use Fedora with either KDE or Gnome on my personal desktop and laptop with very few issues overall. Gnome seems much better since 42 even if overall, Gnome shell has not been the most stable in the past. Steam runs great. I also encountered some grub bootloader bug recently, which disappeared once I upgraded my bios.
This is a follow up of my previous post on rough volatility. I recently tried to reproduce the results of the paper Rough Volatility: Fact or Artefact? as I was curious to apply the technique using different inputs. The 5-minutes SPX realized volatility is freely available in CSV format at the
Oxford-Man Institute of Quantitative Finance and it is thus relatively straightforward to reproduce the numbers presented in the paper.
Using a sampling of K=75 and L=75*75, I obtain an rounghness index H=0.181. The paper uses K=70 and L = 70x70, and their Figure 19 of the paper states 0.187.
It turns out that there are more than L observations in the time-series, and, with K=70, I end up with a roughness index H=0.222 when I start from the first observation (year 2000), up to the observation L+1. But it is possible to slide this window and compute the roughness index at each starting point. The results are enlightening.
SPX historical vol roughness index sliding estimate with K=70.
density of SPX historical vol roughness index estimate with K=70. The mean is 0.158 with a standard deviation of 0.034
The mean estimated roughness is 0.158 with a standard deviation of 0.034. The results of the paper are within one standard deviation. But it shows it may be dangerous to use a single window for the estimate. If we use a smaller window size K=50, L=50x50, we can see regimes of roughness in time.
SPX roughness index sliding estimate with K=50.
density of SPX roughness index estimate with K=70. The mean is 0.145 with a standard deviation of 0.048
I also double checked that the SPX closing prices resulted in an estimate H=0.5, for the two window sizes. The SPX price process follows the standard Brownian motion statistics.
The paper explains that using the 5 minutes realized volatility is not necessarily a good proxy to measure the true underlying instant volatility roughness. I wondered then what happened if we use the daily quotes of the short term at-the-money (ATM) implied volatility as a proxy. Indeed, in a given stochastic volatility model, the short term ATM implied volatility is approximately the instant volatility of the stochastic volatility process. It turns out that there is some freely available data from the Nasdaq for MSFT, but only for the period 2015-01-02 to 2018-12-32 (four years). This constitutes 1006 samples. For the 30-day implied volatility, with K=28, the data implies a roughness index close to 0.5 (0.448).
density of MSFT 30-day implied volatility roughness index estimate. The mean is 0.448 with a standard deviation of 0.059
density of MSFT 10-day implied volatility roughness index estimate. The mean is 0.350 with a standard deviation of 0.049
Roughness is apparent in the 10-day implied volatility, with H=0.35. For the 60-day implied vol, I measure a roughness H=0.41 with a deviation of 0.06, so again not significantly different from the standard Brownian motion roughness. The roughness of the implied volatility skew, also available in this data, is more pronounced.
Another interesting analysis is the roughness of the VIX index, which can be mapped to the price of a newly issued 30 days variance swap on the SPX index. I am actually slightly surprised that the various papers on rough volatility have not looked at it already. I just took the quotes from Yahoo. With K=51, I find H=0.347 with a standard deviation of 0.026. This is close to the 10-day implied volatility roughness measured previously.
VIX roughness index sliding estimate with K=51.
density of VIX roughness index estimate with K=51. The mean is 0.347 with a standard deviation of 0.026
What about the Heston model? What is the estimate of the roughness index on a simulated stochastic variance path? Using one time-step per day for 5 years, and K=25, we find H=0.505 with a standard deviation of 0.032 for the roughness of the instantaneous variance, as well as for the one of the one-month at-the-money implied volatility (using v(t) and constant Heston parameters). This is, as expected, the standard roughness of the underlying Brownian motion.
To conclude, the Hurst index H of the implied volatility, and of the VIX index appear to be not as low as the one of the 5 minutes historical volatility. The estimated roughness H=0.35 is much closer to the standard Brownian motion roughness. In fact, for the 30-day implied volatility (with daily quotes), the measure is even closer to 0.5.
Looking at the 5 minutes historical volatility may make sense to price intraday options, but it is not clear what is the importance of it for traditional derivatives pricing (with maturities longer than a week). Does the historical micro-structure really matter there?
After writing all that, I found out Livieri et al. (2017) paper titled Rough volatility: evidence from
option prices which also measures the roughness of the implied volatilities. Interestingly, I did not reproduce some of the claims in that paper: I find a large bias in the implied volatility proxy even for a low H value. The large bias, combined to the uncertainty in the measurement makes the implied volatility proxy not very practical to estimate the true Hurst index of the underlying volatility process. Another discrepancy with the paper is around the Heston model: I do not find any apparent roughness at all in the Heston model.
I added this post, with more details, as a note on arxiv, just in case someone needs to reference this. I intend to place some the Julia code I wrote in the context of the paper on github, although I did not find the time yet.
It is well-known that the assumption of constant volatility in the Black-Scholes model for pricing financial contracts is wrong and may lead
to serious mispricing, especially for any exotic derivative contracts.
A classic approach is to use a deterministic local volatility model to take into account the variation both in the time dimension and in the underlying asset price dimension. But
the model is still not realistic in terms of forward smile (the implied volatilities of forward starting options). A stochastic volatility component must be added to correct for it.
More recently, the concept of rough volatility emerged in many academic papers. Instead of using a classic Brownian motion for the stochastic volatility process,
a fractional Brownian motion is used. The idea of using a fractional Brownian motion for financial time-series can be traced back to Mandelbrot, but it is only relatively recently that it has been proposed for the volatility process (and not the stock process).
After many published papers on the subject, Rama Cont and Purba Das recently added their preprint Rough Volatility: Fact or Artefact? on SSRN, raising the question whether using a rough volatility process makes sense from a practical point of view. Indeed, they show that the Hurst index measured is different for the realized volatility (even based on very short sampling intervals) and the instantaneous volatility. The authors make the conjecture that is is all due to the discretization error (a.k.a. the microstructure noise), and that in fact, a standard Brownian motion is compatible with the low Hurst index of the realized volatility.
As mentioned in their conclusion, they are not alone in showing that the rough volatility assumption may not be justified, LCG Rogers has also published a note in 2019 Things we think we know, with the same conclusion.
In a 2019 paper, Masaaki Fukasawa, Tetsuya Takabatake and Rebecca Westphal also attempt to answer the question Is volatility rough?. They present the same problem with the estimation method of Rosenbaum and Gatheral, namely that their measure may actually not measure the true Hurst index, and this could be why it is always close to H=0.1, regardless of the market index. It is shown that for specific models, the Rosenbaum and Gatheral measure is dependent on the period length considered (5 minutes vs. 1 minute lead to different Hurst indices). The authors therefore devise a more robust (and much more complex) way to estimate the Hurst index of time series, which is not so sensitive to the period length considered. They find that the market volatility, measured through the realized variance proxy is rough, with H=0.04, rougher than what Jim Gatheral suggested.
To conclude, the study of Gatheral and Rosenbaum, which motivated the use of rough volatility, was based on “biased” estimates of the Hurst index of the realized volatility of high frequency data as evidenced in the three papers cited above. Clearly then, this justification does not hold. But Fukasawa et al. suggest the volatility is rougher, and it is slightly surprising that Rama Cont and Purba Das do not take more those results into account. They shortly discuss it, but then go on proposing another biased estimate (one which seems to lead to the same results as Rosenbaum and Gatheral). To their credit, Rama Cont and Purba Das show that the microstructure noise is far from IID, while Fukasawa et al. assume it is close to IID in their method. Perhaps their indirect conclusion is that Fukasaza et al. estimates are also biased (i.e. measuring something else - although the results presented in their paper seem solid).
Now, there are also other interesting features of the rough volatility models, such as the variation of the at-the-money skew in time, which mimicks the market implied skews in practice, a feature not easily achieved by stochastic volatility models (this typically requires a multidimensional stochastic volatility process to approach it crudely).
When writing a Monte-Carlo simulation to price financial derivative contracts, the most straightforward is to code a loop over the number of paths, in which each path is fully calculated. Inside the loop, a payoff function takes this path to compute the present value of the contract on the given path. The present values are recorded to lead to the Monte-Carlo statistics (mean, standard deviation).
I ignore here any eventual callability of the payoff which may still be addressed with some work-arounds in this setup. The idea can be schematized by the following go code:
for i :=0; i < numSimulations; i++ {
pathGenerator.ComputeNextPath(&path) //path contains an array of n time-steps of float64.
pathEvaluator.Evaluate(&path, output)
statistics.RecordValue(output)
}
A python programmer would likely not write a simulation this way, as the python code inside the large loop will not be fast. Instead, the python programmer will write a vectorized simulation, generating all paths at the same time.
pathGenerator.ComputeAllPaths(&paths) //paths contains an array of n time-steps of vectors of size numSimulations
pathEvaluator.EvaluateAll(&paths, output)
statistics.RecordValues(output)
The latter approach sounds nice, but may easily blow up the memory, if all paths are stored in memory. One solution is to store only packages of path, and do the simulation multiple times, being careful as to where we start the random numbers along the way. It is possible to further optimize memory by not keeping each full path in memory, but just the current path values, along with a list of vector variables which track the payoff path dependency. This requires a special Brownian-bridge implementation in the context of Sobol quasi-random numbers, as described in Jherek Healy’s book.
We can thus summarize the various approaches by (A) iterative on scalars (B) fully vectorized (C) iterative on vectors. I intentionally abuse the language here as the scalars are not scalars, but 1 path comprised of values at all relevant times - in this terminology, scalar means a single path.
Which one is most appropriate?
I first heard about this debate in a Quantlib user meeting a long time ago. Alexander Sokol was arguing for (B) as it simplified the implementation of AAD, while Peter Caspers was more a (A) kind of guy. At the time, I had not fully grasped the interest of (B) as it seemed more complex to put in place properly.
It is interesting to think about those in terms of parallelization. At first, it may seem that (A) is simple to parallelize: you just process N/M package where M is the number of processors and N the total number of paths. Unfortunately, such a design implies a stateful payoff, which is not necessarily trivial to make thread-safe (actually very challenging). So the first drawback is that you may have to limit the parallelization to the path generation and to perform the payoff evaluation in a single thread. And then if you think more about it, you notice that while the first package is being computed, the payoff evaluation will just wait there. Sure, other packages will be computed in parallel, but this may again blow off the memory as you essentially keep all the paths in each package in memory. In practice, we would thus need to compute each path in some sort of thread pool and push those to a path queue of fixed length (blocking), which is read by the single thread, by taking a path from the queue, evaluate the payoff on this path, and pushing it back to a free queue in order to recycle the path. Alternatively, some sort of ring buffer structure could be used for the paths. One has also to be quite careful as well about the random number generator skipping to the correct position in order to keep the same sequence of random numbers as in a fully single threaded approach.
Taking parallelization into account, (A) is not that the trivial approach anymore. In contrast, (B) or (C) would parallelize the payoff evaluation by design, making the implementation of the full simulation parallelization much easier. Arguably, the package of paths approach may consume less memory than (C), as (1) the path generation usually involves many intermediate time-steps which may be discarded (kept only in cache per core) while in a vectorized approach, the full vector of those intermediate time-steps may be needed (typically for a classical Brownian-Bridge variance reduction), (2) the payoff evaluates scalars (and its state is thus comprised of scalars). A clever vectorized payoff evaluation would however reuse most vectors. The memory reduction may not be real in practice.
Now, if we include callability or particle method kind of evaluation, full vectorization, but keeping only the payoff (vector) variables in memory and building up the path step by step may be the most effective as described in Jherek Healy’s book.
To conclude, not vectorizing the path generation and payoff evaluation would be a design mistake nowadays.
This blog post from Jherek Healy presents some not so obvious behavior of automatic differentiation, when a function is decomposed
into the product of two parts where one part goes to infinity and the other to zero, and we know the overall result must go to zero (or to some other specific number).
This decomposition may be relatively simple to handle for the value of the function, but becomes far less trivial to think of in advance, at the derivative level.
I played myself with Julia’s excellent ForwardDiff package, and found out a completely different example which illustrates a simple and easy trap with automatic differentiation. The example is a typical cubic spline evaluation routine:
function evaluate(self, z)
i = searchsortedfirst(self.x, z) # x[i-1]<z<=x[i]if z == self.x[i]
return self.a[i]
endif i >1 i -=1end h = z - self.x[i]
return self.a[i] + h * (self.b[i] + h * (self.c[i] + h * (self.d[i])))
end
The ForwardDiff will lead to derivative = 0 at z = x[i], while, in reality, it is not - the derivative is b[i]. The case z = x[i] is just a (not so clever) optimization.
Granted, it is wrong only at the spline knots, but if those correspond to key points we evaluate as part of a minimization, it might become very problematic.
Another interesting remark to make on automatic differentiation is on the choice of backward (reverse) vs. forward (tangent) automatic differentiation: we often read that backward is more appropriate when the dimension of the output is low compared to the dimension of the input, and forward when the dimension of the output is large compared to the dimension of the input.
This is however not strictly true. A good example is the case of a least-squares minimization. The objective could be seen as a one-dimensional output: the sum of squares of the values of an underlying function.
But in reality, it is really N-dimensional, where N is the number of terms in the sum, as one sum involves N calculations of the underlying function.
And thus, for least-squares, backward automatic differentiation will likely be slower than forward since the number of terms in the sum is typically larger than the number of input parameters we are trying to fit.
lsqnonlin defaults to forward AD when the number of elements in the objective vector is greater than or equal to the number of variables. Otherwise, lsqnonlin defaults to reverse AD.
As described on wikipedia, a quadratic programming problem with n variables and m constraints is of the form
$$ \min(-d^T x + 1/2 x^T D x) $$ with the
constraints \( A^T x \geq b_0 \), were \(D\) is a \(n \times n\)-dimensional real symmetric matrix, \(A\) is a \(n \times m\)-dimensional real matrix, \( b_0 \) is a \(m\)-dimensional vector of constraints, \( d \) is a \(n\)-dimensional vector, and the variable \(x\) is a \(n\)-dimensional vector.
Solving convex quadratic programming problems happens to be useful in several areas of finance. One of the applications is to find the set of arbitrage-free option prices closest to a given set of market option prices as described in An arbitrage-free interpolation of class C2 for option prices (also published in the Journal of Derivatives). Another application is portfolio optimization.
In a blog post dating from 2018, Jherek Healy found that the sparse solve_qp solver of scilab was the most efficient across various open-source alternatives. The underlying algorithm is actually Berwin Turlach quadprog, originally coded in Fortran, and available as a R package. I had used this algorithm to implement the techniques described in my paper (and even proposed a small improvement regarding machine epsilon accuracy treatment, now included in the latest version of quadprog).
Julia, like Python, offers several good convex optimizers. But those support a richer set of problems than only the basic standard quadratic programming problem. As a consequence, they are not optimized for our simple use case. Indeed, I have ported the algorithm to Julia, and found out a 100x performance increase over the COSMO solver on the closest arbitrage-free option prices problem. Below is a table summarizing the results (also detailed on the github repo main page).
In the context of my thesis, I explored the use of stochastic collocation to capture the marginal densities of a positive asset.
Indeed, most financial asset prices must be non-negative. But the classic stochastic collocation towards the normally distributed random variable, is not.
A simple tweak, proposed early on by Grzelak, is to assume absorption and use the put-call parity to price put options (which otherwise depend on the left tail).
This sort of works most of the time, but a priori, there is no guarantee that we will end up with a positive put option price.
As an extreme example, we may consider the case where the collocation price formula leads to \(V_{\textsf{call}}(K=0) < f \) where \(f \) is the forward price to maturity.
The put-call parity relation applied at \(K=0 \) leads to \(V_{\textsf{put}}(K=0) = V_{\textsf{call}}(K=0)-f < 0 \). This means that for some strictly positive strike, the put option price will be negative, which is non-sensical.
In reality, it thus implies that absorption must happen earlier, not at \(S=0 \), but at some strictly positive asset price. And then it is not so obvious to chose the right value in advance.
In the paper, I look at alternative ways of expressing the absorption, which do not have this issue. Intuitively however, one may wonder why we would go through the hassle of considering a distribution which may end up negative to model a positive price.
In finance, the assumption of lognormal distribution of price returns is very common. The most straighforward would thus be to collocate towards a lognormal variate (instead of a normal one), and use an increasing polynomial map from \([0, \infty) \) to \([0, \infty) \).
There is no real numerical challenge to implement the idea. However it turns out not to work well, for reasons explained in the paper, one of those being a lack of invariance with regards to the lognormal distribution volatility.
This did not make it directly to the final thesis, because it was not core to it. Instead, I explore absorption with the Heston driver process (although, in hindsight, I should have just considered mapping 0 to 0 in the monotonic spline extrapolation). I recently added the paper on the simpler case of positive collocation with a normal or lognormal process to the arxiv.
Steven Koonin, who was Secretary for Science, Department of Energy, in the Obama administration recently wrote a somewhat controversial book on climate science with the title Unsettled. I was curious to read what kind of critics a physicist who partly worked in the field had, even if I believe that climate warming is real, and humans have an influence on it. It turns out that some of his remarks regarding models are relevant way beyond climate science, but some other subjects are not as convincing.
The first half of the book explains how the influence of humans on climate is not as obvious as what many graphs suggest, even those published by well known groups such as CSSR or IPCC.
A recurring theme is plots presented without context/with the wrong context. Many times, when we extend the time span in the past, a very different picture emerges of cyclic variations with cycles of different lengths and amplitudes. Other times, the choice of measure is not appropriate. Yey another aspect is the uncertainty of the models, which seems too rarely discussed in reports/papers, as well as the necessary model parameters tweaking.
When it comes to CO2, Koonin explains that the decrease in cooling power is much much lower from 400ppm to 800ppm than from 0ppm to 400ppm, because, at 400ppm (the 2019 average) most of the light frequencies are already absorbed. The doubling corresponds only to a 0.8% change in heat intercepting. 1% is however not so small, as it corresponds to 3 degrees Kelvin/Celcius variation. Furthermore, while humans may increase warming with greenhouse gas, they also increase cooling via aerosols, the latter being much less well known and measured (with those taken into account, we arrive at the 1 degree Celcius impact). The author also presents some plots on CO2 on a scale of millions of years to explain that we live in an era of low CO2 ppm. While an interesting fact, this may be a bit misleading since, if we look at at up to 800,000 years back, the CO2 concentration stayed between 200 and 250 ppm, with an explosion only in the recent decades. The climate.gov website provides more context around the latter plot:
In fact, the last time the atmospheric CO₂ amounts were this high was more than 3 million years ago, when temperature was 2°–3°C (3.6°–5.4°F) higher than during the pre-industrial era, and sea level was 15–25 meters (50–80 feet) higher than today.
On this plot, the relation between the temperature increase and the CO2 ppm looks linear, which would be somewhat more alarming.
The conclusion from the first half of the book may be summarized as follows: while climate warming is real, it is not so obvious how much of it is due to humans. The author agrees for maybe around 1 degree Celcius, and suggests it is not necessarily accelerating. The causes are much less clear. There are some interesting subtleties: temperature increases in some places and decreases in others; around cities, the increase mostly due to the expansion of cities (more buildings).
I found the second half to be in contradiction with the first half, although it is clearly not the author’s intent: the second half focuses on how to address humans influence on the climate, and several times, suggests a strong influence of humans emissions on the climate, while the first half of the book was all about minimizing that aspect. This is especially true for carbon emissions, where it is suggested in the first half that additional emisssions will have a comparatively small impact.
The overall message is relatively simple: reducing emissions won’t help much as concentration will only increase in the coming decades (but then wouldn’t it perhaps be so bad to think beyond the coming decades?). Also the scales of emission reductions necessary for a minimum increase (2 degrees) is not realistic at all in the world we live in. Instead, we’d be better off trying to adapt.
Overall, the author denounce the issue of scientific integrity, which is too often absent or not strongly present enough. Having reviewed many papers, and published some in specialized journals, I can’t say I am surprised. Peer review is important, but perhaps not good enough by itself, especially in the short run. Over decades, I think the system may autocorrect itself however.
Github recently moved to support only ssh access via public/private keys. As I use github to host this blog, I was impacted.
The setup on Linux is not very complicated, and relatively well documented on Github itself but all the steps are not listed in a simplistic manner, and some Google search is still required to find out how to setup multiple private keys for various different servers or different repos.
Here is my short guide. In bash, do the following:
ssh-keygen -t ed25519 -C "your_email@example.com"
Make sure to give a specific name, e.g. /home/you/.ssh/id_customrepo_ed25519
One thing that motivated me for vaccination is the fake news propaganda against the Covid-19 vaccines.
A mild example relates to the data from Israel about the delta variant. This kind of article, with the title “Covid 19 Case Data in Israel, a Troubling Trend”, puts emphasis on the doubts on the effectivness of the vaccine:
the vaccine appears to have a negligible effect on an individual as to whether he/she catches the current strain. Moreover, the data indicates that the current vaccines used (Moderna, Pfizer-BioNTech, AstraZeneca) may have a decreasing effect on reduced hospitalizations and death if one does get infected with the Delta variant.
CNBC (far from the best news site) is much more balanced and precise in contrast:
However, the two-dose vaccine still works very well in preventing people from getting seriously sick, demonstrating 88% effectiveness against hospitalization and 91% effectiveness against severe illness, according to the Israeli data published Thursday.
Much more shocking is a tweet from Prashant Bhushan (whoever that is) which was relayed widely enough that I got to see it
Public Health Scotland have revealed that 5,522 people have died within twenty-eight days of having a Covid-19 vaccine within the past 6 months in Scotland alone! This is 9 times the people who died due to Covid from March 2020 till Jan 2021 in Scotland!
It points to an article in the Daily Expose (worse than a tabloid?) with a link to the actual study. Giving an actual link to the source is a very good practice, even if in this case, the authors of the news article write a lengthy text which says the opposite as the paper they cite as being the source. Indeed, p.27 of the paper says
Using the 5-year average monthly death rate (by age band and gender) from 2015 to 2019 for comparison, 8,718 deaths would have been expected among the vaccinated population within 28 days of receiving their COVID-19 vaccination.
This means the observed number of deaths is lower than expected compared with mortality rates for the same time period in previous years…
Why do people write such fake news articles?
I can imagine 3 kinds of motivations:
The author wants to make money/be famous. Writing controversial things, even if completely false, seems to work wonder to attract readership, as long as there is a trend of readers curious about this controverse. The author will gather page views and followers easily, which he may then monetize.
The author is just someone so convinced in his beliefs that everything is interpreted through a very narrow band-pass filter. They will read and pay attention only to the words in the text that will validate their beliefs and forget everything else. I know examples (not on this particular subject) in my own family.
Farm trolls. There are many farm trolls (we always hear of the Russian farm trolls, but they really exist in most countries), where people are paid to spread disinformation on social media. A simple example is the fake reviews on Amazon. Sometimes the goal is to promote a particular political party ideas. Sometimes this political party is financed by a foreign country with specific interests.
The third motivation may act as an amplifier of the first two.
Why are fake news successful?
People do not trust the official government message. On the coronavirus, the French government was initially quite bad at presenting the situation, with the famous “masks are useless” (because they did not have any stock), a couple of months before enforcing masks everywhere. Too many decisions are purely political. A good example is the policy to close a school class on a single positive test when the full school is being tested for coronavirus. The false positive rate guarantees that at least 1/3 of the classes will close. Initially the policy was 3 positive tests, which was much more rational, but not as obvious to understand for the masses. My son ended up being one of those false positives, and it wasn’t a particularly nice experience.
People do not trust traditional media anymore. As the quality of standard newspapers dwindled with the rise of the internet, people do not trust those much anymore. Also, perhaps fairly, it is not clear if classic tradional media present better news than free modern social media. There are exceptions of course (typically with monthly publications).
Social media bias: almost all social networks act as narrow band-pass filter. If you look at one video of cat on youtube, you will end up with videos of cats in your youtube frontpage every day. Controversial subjects/videos are given a considerable space. A main reason for this behavior is that they are optimized for page views/to attract your attention.
Generalized attention deficit disorder due to the ubiquity of smart phones. Twitter got famous because of the 127 characters message limit. Newspaper articles are perhaps five times shorter than they used to be in the early 1990s. The time we have is limited and a lot of it is spent reading/writing short updates to various Whatsapp groups or other social network apps.
Conclusion
This was not the only motivator, another strong one is a Russian friend who caught the Covid-19 in July 2021, who was not very pro-vaccination, but thought, once it was too late, “I really should have applied for the vaccine”, as he realized it impacted the whole circle of relations around him. Then, one by one, his whole family caught the Covid. Fortunately, no-one died.