Constraint-based causal discovery with tiered background knowledge
Veröffentlichungsdatum
2025-03-11
Autoren
Betreuer
Gutachter
Magliacane, Sara
Zusammenfassung
This thesis explores how information about temporal structures can improve causal discovery methods. I consider extensions of existing constraint-based causal discovery algorithms: The PC, FCI, and IOD. These algorithms estimate graphs based on (conditional) independence testing, and are originally purely data-driven. However, often we have more information available than only the dependence structure. This I refer to as background knowledge, and this can be used to extend and improve the existing algorithms. In this thesis, I focus on information given by a temporal order, e.g. in which order variables are measured. This entails a special kind of background knowledge, which I refer to as temporal, or tiered, background knowledge.
The fact that the use of tiered background knowledge improves causal discovery methods appears evident. The novel contribution of this thesis is a thorough investigation of the ways in which the algorithms are improved. This includes formal results and examples, as well as empirical results using both simulated and real data. The improvements due to tiered background knowledge fall into one of two categories: Informativeness and accuracy. Constraint-based causal discovery suffer from an under-identifiability of the causal structure, since multiple graphs may encode the same dependence structure. An improvement in informativeness means an increased certainty of the causal connections. In practice, the output graphs are often incorrect due to errors arising from independence testing using finite sample data. An improvement in accuracy means that the estimated graphs are less prone to errors.
This thesis presents a formalisation of (tiered) background knowledge, and results on how it improves constraint-based causal discovery in different aspects under different assumptions:
First, I assume that all relevant variables are observed, and that we are given the correct (conditional) independencies among the observed variables. I give a criterion for when an increase in informativeness is obtained by adding tiered background knowledge. This criterion suggests that adding background knowledge of early tiers yields the largest increase in informativeness, with the overall largest increase for sparse graphs. This is supported by the results of a simulation study. Moreover, I show that the graphical output has some desirable interpretational and computational properties.
Second, I relax the assumption of knowing the correct (conditional) independencies among the observed variables. However, I still assume that all relevant variables are observed. I consider the properties of the existing tiered PC (tPC) algorithm. This is an extension of the original PC, which skips some conditional independence tests and orients some (additional) edges, both based on tiered background knowledge. I show how this improves the accuracy.
Third, I again assume knowledge of (conditional) independencies among the observed variables. However, I allow for unobserved variables, and I combine multiple overlapping datasets. I describe the existing tiered FCI (tFCI) and introduce the novel tiered IOD (tIOD), which both extend the existing algorithms with tiered background knowledge, similar to the tPC. The output of the IOD algorithm consists of multiple graphs, which implies an additional level of under-identifiability. I show that tiered background knowledge can decrease this number of graphs.
Lastly, I discuss how the results presented here relate to similar work, as well as some open problems and possible extensions. All algorithms are provided as pseudo-algorithms, and are accompanied by proofs of soundness, and often also completeness.
The fact that the use of tiered background knowledge improves causal discovery methods appears evident. The novel contribution of this thesis is a thorough investigation of the ways in which the algorithms are improved. This includes formal results and examples, as well as empirical results using both simulated and real data. The improvements due to tiered background knowledge fall into one of two categories: Informativeness and accuracy. Constraint-based causal discovery suffer from an under-identifiability of the causal structure, since multiple graphs may encode the same dependence structure. An improvement in informativeness means an increased certainty of the causal connections. In practice, the output graphs are often incorrect due to errors arising from independence testing using finite sample data. An improvement in accuracy means that the estimated graphs are less prone to errors.
This thesis presents a formalisation of (tiered) background knowledge, and results on how it improves constraint-based causal discovery in different aspects under different assumptions:
First, I assume that all relevant variables are observed, and that we are given the correct (conditional) independencies among the observed variables. I give a criterion for when an increase in informativeness is obtained by adding tiered background knowledge. This criterion suggests that adding background knowledge of early tiers yields the largest increase in informativeness, with the overall largest increase for sparse graphs. This is supported by the results of a simulation study. Moreover, I show that the graphical output has some desirable interpretational and computational properties.
Second, I relax the assumption of knowing the correct (conditional) independencies among the observed variables. However, I still assume that all relevant variables are observed. I consider the properties of the existing tiered PC (tPC) algorithm. This is an extension of the original PC, which skips some conditional independence tests and orients some (additional) edges, both based on tiered background knowledge. I show how this improves the accuracy.
Third, I again assume knowledge of (conditional) independencies among the observed variables. However, I allow for unobserved variables, and I combine multiple overlapping datasets. I describe the existing tiered FCI (tFCI) and introduce the novel tiered IOD (tIOD), which both extend the existing algorithms with tiered background knowledge, similar to the tPC. The output of the IOD algorithm consists of multiple graphs, which implies an additional level of under-identifiability. I show that tiered background knowledge can decrease this number of graphs.
Lastly, I discuss how the results presented here relate to similar work, as well as some open problems and possible extensions. All algorithms are provided as pseudo-algorithms, and are accompanied by proofs of soundness, and often also completeness.
Schlagwörter
Causal discovery
;
Structure learning
;
Graphical models
;
Causality
;
Causal inference
;
Background knowledge
;
Cohort data
;
Longitudinal data
;
Temporal data
Institution
Fachbereich
Dokumenttyp
Dissertation
Sprache
Englisch
Dateien![Vorschaubild]()
Lade...
Name
Constraint-based causal discovery with tiered background knowledge.pdf
Size
6.12 MB
Format
Adobe PDF
Checksum
(MD5):5c2b5e63c2cffad8a150ba5213905975