Data format
First row - learning parameters
errorgoal - stop learning if error goal
is reached.
numberofiterations - stop criterian
if maximum number of iterations is reached.
Second row - variable names
The remaining rows - data values for training
, - the separator
? - indicates the missing value
: - indicates the queried value
@ - indicates the label of the query
& - indicates the frequency of
the query being asked (assumed uniform distribution in our experiments)
everything else - indicates the actual
value
The following is an example for the simple A->X->C network
errorgoal,.000000001,numberofiterations,50,maxnumberoflines,2
A,X,C
0,?,:0@1.0&1
1,?,:1@1.0&1
This data indicates that P(C=0|A=0)=1, and P(C=1|A=1)=1.
Data generation
Input
Number of queries to generate
Evidence variables
Query variables (may overlap with evidence variables)
Amount of missing evidence
Procedure
For each training query to be generated
For each
evidence variable
Select it as an observed evidence with the probability specified
Assign
the values to the selected evidence variables using their prior probability
distribution
Randomly
select a query variable
Set a value
for the query variable using its infered probability distribution