Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pre-2018 blog
This is a copy of the posts, on the Worpress platform, that were made in 2016 and 2017
March 1, 2016

We are interested in competitors performance.

Most PBPK simulators follow a pattern where a GUI is used to design a physiology model in order to construct ODEs that are solved by an ODE solver. The trick is to make it easy for the user to think in physiology terms, when she enters model’s parameters or reads simulation’s results, while at the same time enable the code to manage an ODE solver accordingly to this model.

Understanding what a solver really does is not so easy, particularly if the source code is not available. But even if it is available, what can be deduced is not very informative, as a model is something which is instantiated at run time, and not something written out in the source code. Most of simulator’s code is used in the GUI, the solver is often provided by a third party such as Numerical recipes ( Even for the GUI, there is reliance on existing libraries such as Java’s DefaultMutableTreeNode or JFreeChart. To understand what the solver does, one has to observe the user provided function that the ODE solver calls to progress from one step to the next.

We took a small free PBPK modeler. The literature about it is sparse but presents it in a favourable manner. The reader understands this software is a labour of love, minuscule details seems to be taken in account. Its GUI follows the form paradigm and is quite complex.

After decompiling the binaries, we logged each call to the user function, rebuilt the software and ran it with the default values. At the end, it was apparent that this simulator only uses 5 ODEs on only two compartments: Lung and liver. Nothing is computed about  kidney, muscle or heart even if their models are described with much details in this software’s literature.

August 24, 2016

PBPK simulators use a compartmental approach, where fluids are transferred between compartments and transformed inside them.

It is a very mechanistic approach, a successful one, but it ignores many important aspects of Mammalian biology such as the influence of the genome on health or the signaling between cells or throughout the organism, for example with the immune system.

Even the illness or simply the unhealthy human, is not implemented in models, rather they are “cases” that are hard-wired in the software.

It is well known there is a need to separate the model from the simulator, in order to make it possible to change some parameters or even the whole model at will. Every CellML or SBML simulator offers that kind of functionality.

It goes the same way for genetic information, not only it should be taken in account, but it should be separated and accessible in its own set of portable data. I do not know how SBML format would make it possible.

Cell or organism signaling should also be assigned to a distinct set of portable data. We have already something similar for fluids in our current simulator’s PoC, it is separated in a distinct XML file, something unfortunately not standardized.

Therefore we have to think how fluids, genetic information (and variants) as well as signaling or health will be taken in account in future versions of the PoC of our simulator.

In addition we have to offer a multi-faceted GUI, for example a human diabetic model and a dysfunction of insulin production are nearly about the same thing, but they are different ways to discuss about it and they are not the exactly the same topic.

November 29, 2016

One of my colleagues remarked today that a lot of old biology software tools and libraries are designed by academics and abandoned as soon as their interest (and funding) switches to something else.
So it is very dangerous for a community or a business to rely on this kind of tools, as when something goes wrong there is no expert in sight ready to offer an helping hand.
I wondered if something could be done to improve this situation. At least a list of such abandoned biology software could be maintained.

October 17, 2017

Today we used the audio extracted from the video in a post below. It was taken by a Logitech 270 webcam which was pressed against my finger, hence the strange red pulsating color.

The audio was truncated as in the beginning the camera was moved a bit harshly and at the end the boiler used by my other self was creating an increasing white noise.

We just used Hjerte 0.3 (available on this Hackaday page and on Github) to classify the heart sound. It recognizes my heart beat quite well, and seems to find convincing S1 and S2 sounds.

What are the take home points?

– There is no need for a ultrasound Doppler to record heart sounds. Hence the “passive” in the title. It is important as some people dislike the idea of using ultrasounds in mass market devices.

– Heart sounds can be recorded on a finger!

– The Hjerte algorithm works even in weird conditions (ordinary microphone, lot of noise).

Now we have to use the video component in the file, and use it in conjunction with the audio part. An obvious use would be to have a reliable heart beat detection, before starting the segmentation.

We are dreaming of multiple webcams recording heart and lung sounds and integrating all those information!

July 12, 2017

The first usable versions of our features detection code ( were full of hardwired constants and heuristics.
The code has now been modularized, it spreads several methods with clean condition of method exit.

We were proud that our design was able to look at each beat and heart sound, which is a far greater achievement than what ML code does usually. Something really interesting was how we used compression to detect heart sounds features automatically in each beat.
Now we introduce something similar in spirit: Until now sometimes our code was unable to find the correct heart rate if the sound file was heavily polluted by noise. Now we use a simple statistical test akin to standard deviation, to test the randomness of beats distribution. If it is distributed at random, then it means our threshold is too low: We detect noise in addition to the signal.
This helped us to improve the guessing of the heart rate.

In an unrelated area, we also started to work on multi-HMM, which means detecting several, concurrent features. An idea that we toy with, would be to use our compression trick, at beat level, whereas now it is used at heart sound level. This is tricky and interesting in the context of a multi-HMM. Indeed it makes multi-HMM​ more similar to unsupervised ML algorithms.

October 30, 2016

Some 15 years ago, ontologies were the big thing. Financing an EU project was easy if ontologies and semantics were mentioned as primary goals.
Now this time is gone, except in biology where ontologies are still used, often in a very different way from what they were originally intended to do in the “Semantic Web” good old time.

More specifically a common biology research activity is to measure the expression of proteins in two situations, for example in healthy people and in patients. Then the difference between the two sets of measurements is asserted, and the proteins and their genes that are activated in the illness situation are suspected to be possible targets for any new drug.

Gene differential expression is the biological counterpart of machine learning in CS, it is a one size fits all solving methodology.

Indeed those deferentially expressed genes are rarely possible targets for any new drug , as each protein and gene is implicated in so many pathways. So instead of refining the experimentation, to find genes that are implicated in a fewer pathways, a gene “enrichment” step is launched. “Enrichment” involves querying an ontology database, to obtain a list of genes/proteins that are related to the deferentially expressed genes, and that are hopefully easier target for putative drugs.

Here there are two problems.
* The first is the choice of the ontology, for example there is an excellent one which is named Uniprot. But there are some awful but often preferred choices, like Gene Ontology which gives dozens results when Uniprot gives one. Indeed if you have only one result after “enrichment” and if you are in a hurry, you are not happy, so the incentive to switch to Gene Ontology is strong.
* The second problem arises when the result set comprises several hundred genes/proteins. Obviously this is useless, but instead of trying to define a better experimentation, people thought that some statistical criteria would sort the result set and find a credible list of genes. This lead to the design of parameter free tools such as GSEA. Very roughly these tools compare the statistical properties of the result set with those of a random distribution of genes, if they are very different, then the conclusion is that they are not at random, which does not tell much more than that. This is similar and related to the criticism of fisher test, p-value and null hypothesis. This is a complicated domain of knowledge.

These tools are very smart, but the best tool cannot provide meaningful answers from garbage, so disputes soon arisen about the choice of the free parameter methodology, instead of questioning the dubious practices that made them needed in the first place.

October 5, 2017

So, since one year we did develop a heart failure detector. For a proof of concept, it works quite well and it is open source.
The making of was documented on Hackaday:

We invite you to review this work, adapt it, and sell it in the format that fits the best your goals and capabilities. If you need help just ask us with the contact form:

Now is the time to plan our next steps.
First we observe that technology is often a bit arrogant when it comes to medicine.
It is not because a heart is detected as having some problem at some time that the patient problem is solved. What is detected could be the sign of many disorders and thus doesn’t tell what is wrong. This heart problem may be transient or linked to another condition. It may not exist at all. It is the role of the family doctor to capture the full picture and prioritize issues and define treatments.

What we envision is a kind of tooling for family doctors that is coupled to a physiology model. It is actually an extension of what doctors use today, it is not a revolution. Today doctors can browse conditions and remedies on their computer according to parameters they define.
What are missing are the data sources, that for now are provided through medical diagnosis that is the realm of specialists.

Ideally, no new software should be installed on the doctor’s computer, she would securely access the physiology modeling tool through her browser. And this physiology modeling tool would access her data sources securely without the doctor’s IT service having to install them.

July 3, 2017

Up to now the feature detection has used something that I find funny, but it works really well. As we use Hidden Markov Models, we must create a list of “observations” for which the HMM infer a model (the hidden states). So creating trustable observations is really important, it is a design decision that those observations would be the “heart sounds” that cardiologists name S1, S1, etc..

In order to detect those events, we first have to find the heart beats, then find sonic events in each of them. In CINC/Physionet 2016 they use a FFT to find the the basic heart rate, and because a FFT cannot inform on heart rate variability, they compute various statistical indicators linked to heart rate variability.
And its not a very good approach as the main frequency of a FFT is not always the heart beat rate.
Furthermore this approach is useless at the heart beat level and indeed at heart sound level. So what we did, was to detect heart beats (which is harder that one could think) and from that point, we can detect heart sounds.

Having a series of observations that would consist only of four heart sounds, would not be useful at all. After all a Sn+1 heart sound, is simply the heart sound that comes after the Sn heart sound. We needed more information to capture and somehow pre-classify the heart sounds.

It was done (after much efforts) by computing a signature based somehow on a compressed heart sound. Compression is a much more funny thing that it might seem. To compress one has to reduce the redundant information as much as possible, which means that a perfectly compressed signal could be used as a token about this signal, and logical operations could be done with it.

Sometimes people in AI research fantasize that compression is the Graal of machine learning by making feature detection automatic. We are far from thinking that, as that in order to compress one has to understand how the information is structured, and automatic feature detection implies that we do not know its structure.

It is the same catch-22 problem that the Semantic Web met 10 years ago, it can reason on structured data but not on unstructured data, and the only thing that would have been a real breakthrough was reasonning on unstructured data. That is why now we have unsupervised Machine Learning with algorithms like Deep Forest. While Cinc 2016 submissions used heavily unsupervised ML, we used compression (Run Limited Length) to obtain a “signature” of each heart sound, and it works surprisingly well with our HMM.

The next step is to implement a Multi-HMM Approach​, because there are other possibilities to pre-categorize our heart sounds than its RLL signature, for example the heart sound might be early or late and that characteristic could be used to label it.

January 10, 2017

Modern wireless technology can’t transmit energy and information with a good enough SNR, over 80km and over earth curve, in portable low cost devices with current regulations.

We propose a very different approach based on astronomy technology, where a laser emits light vertically, generates a luminous dot at high altitude (similar to astronomy’s guidestar) and this light is detected at very long distance. By modulating the luminosity of this guidestar, it is possible to transmit information. This technology works even if the sky is cloudy and in daylight.

There is no need to build any infrastructure network. Each cell in a field can access the base station even at 80km. The cost per field station is less than $9,000. Field stations can be moved at will.

More information here: base_station_for_deserts


Forum Jump:

Users browsing this thread: 1 Guest(s)