Next: Boxiness estimation method with fourth order moments
Up: VO Resources
Previous: The Chandra Bibliography Database
Table of Contents 
Subject Index 
Author Index 
Search 
PS reprint 
PDF reprint
Solorio, T., Fuentes, O., Terlevich, R., Terlevich, E., & Bressan, A. 2003, in ASP Conf. Ser., Vol. 314 Astronomical Data
Analysis Software and Systems XIII, eds. F. Ochsenbein, M. Allen, & D. Egret (San Francisco: ASP), 609
Automated Determination of Stellar Population Parameters in
Galaxies Using Active
Instancebased Learning
Thamar Solorio, Olac Fuentes, Roberto Terlevich, Elena Terlevich
INAOE, Luis Enrique Erro #1, Tonantzintla, Puebla, 72840,
México
Alessandro Bressan
Osservatorio Astronomico di Padova Vicolo dell'
Osservatorio 5, 135122 Padova, Italy
Abstract:
In this work we focus on the determination of the relative
distributions of young, intermediateage and old populations of
stars in galaxies. Starting from a grid of theoretical population
synthesis models we constructed a set of model
galaxies with a distribution of ages, metallicities and intrinsic
reddening. Using this set we have explored a new fitting method
that presents several advantages over conventional methods. We
propose an optimization technique that combines active learning
with an instancebased machine learning algorithm. Experimental
results show that this method can estimate with high speed and
accuracy the physical parameters of the stellar populations.
The availability for the first time of huge astronomical
spectroscopic surveys such as the SDSS, with more than
spectra, will allow the determination of
intrinsic physical parameters of a large number of galaxies,
including the age distribution or star formation history and
metallicity distribution of their stellar populations.
The importance of the accurate knowledge of these parameters for
cosmological studies and for the understanding of galaxy formation
and evolution cannot be overestimated. Template fitting has been
used to carry out estimates of the distribution of age and
metallicity from spectral data. Although this technique achieves
good results, it is very expensive in terms of computing time and
therefore can be applied only to small samples.
Starting from a grid of theoretical population synthesis models we
constructed a set of model galaxies with a
distribution of ages, metallicities and intrinsic reddening. Using
this set we have explored a new method that maximizes speed and
accuracy. Our proposed technique combines standard leastsquares
fitting with an active instancebased machine learning algorithm.
Experimental results show that this method can estimate with high
speed and accuracy the physical parameters of the stellar
populations. Based on empirical evidence we believe that this
method can be applied with equal success to other astronomical
problems, reducing the computational cost and thus providing the
capability of analyzing larger quantities of astronomical data.
For the spectral synthesis of simple stellar populations the
atmospheric models have been folded with the predicted number of
stars along isochrones of given age and metal content (Bressan et
al. 1994). The atmosphere models have been inserted in low
resolution Kurucz models (Kurucz 1993) in order to preserve the
complete energy distribution.
The models have the following characteristics:
 Ages are from to
in
logarithmic steps:
[
,,,,,,
,]
 Metallicity has the values Z=[0.0004, 0.004, 0.008, 0.02,
0.05] in Solar units
 The resolution is smoothed at the desired
value.
For the present experiments we used solar metallicity (0.02)
and a resolution of 20 Å.
Given an observed galaxy spectrum we would like to determine the
relative distribution of ages and their intrinsic
reddening. We restricted the problem to finding three
contribution of ages: starbursts of age 1Myr, an intermediate age
population with age between 100Myr and 1000Myr and an old population
with age greater than 1000Myr. Each of the three populations is
affected by the same reddening law which is defined as follows:

(1) 
where is the free parameter of each stellar population and
is the wavelength, in this case going from 890Å
to 2.301 m.
In order to determine the free
parameters of reddening and the relative contributions we pose the
problem as an optimization problem, where a modified version of a
machine learning algorithm is trained to estimate the reddening
parameters of the three populations. Once we have an estimate of
the reddening we can compute the relative contribution of ages,
, with a pseudo inverse matrix as follows:
Let
be
the grid of our nine
theoretical models
described earlier.
is the observed spectrum,
and
is the vector of the
free reddening parameters predicted by the learning algorithm for
. We can compute
, by
applying to the theoretical models the reddening function as
defined in equation 2.

(2) 
We know that the observed spectrum
is the
product of and the unknown relative contributions
,

(3) 

(4) 
then by computing , the pseudoinverse of , we can
determine the relative contribution of ages, as equation
4 shows. The following section introduces the
optimization procedure used in this work.
We are interested in the problem of finding the parameters of a
known analytic function that best match an observation. Let
be the observed galactic spectrum variable,
let
be a function with the same
dimensionality as . The goal of the optimization procedure is
to obtain the value of
that minimizes the
error
. In order to solve the
problem more efficiently, we pose it as a learning problem, where
a learning algorithm learns the reddening parameters
, and with a forward model we compute
. The training set used by the algorithm,
, is formed by randomly generated reddening parameters,
, and their corresponding galactic
spectra,
, where contributions of ages were
also generated randomly; its test set consists of the galactic
spectra to be analyzed denoted here by
and it outputs
an estimate of
that is expected to minimize the errors
. When a
new set of solutions
is proposed by
the algorithm, we compute their corresponding
, using
equations 2,3 and 4, and use
the new pairs
to
augment the training set, and continue this iterative process
until convergence is attained. Since this type of active learning
adds to the training set examples that are progressively closer to
the points of interest, the errors are guaranteed to decrease in
every iteration. The pseudocode of the algorithm is the following:
 Generate randomly an initial set of vectors
and compute their
corresponding
.
 Let
be
the initial training
set.
 Let
be the test
set.
 While is not empty
 Train an approximator A using P as training set
 For each
in
 Use A to predict
 Generate

 If
remove from
In this problem the approximator mentioned in step 4.1 is Locally
Weighted Linear Regression, an instancebased learning algorithm
that has shown good results in similar optimization problems
(Fuentes & Solorio 2003).
Figure 1:
In this figure we show test and predicted spectra shifted
by a constant amount
to aid visualization. Vectors A and R are the parameters for the test
spectrum, while A' and R'
are the corresponding predicted parameters.

Table 1:
Mean absolute errors in reddening parameters

In order to evaluate our proposed solution we experimented
generating randomly 500 spectra together with metallicities and
intrinsic reddening, we then generated their corresponding
spectra. From this set we selected randomly 150 spectra that were
used as the test set, the remaining spectra were used as the training set.
We repeated this process 10 times, and reported the overall
average. Table 2 presents mean absolute errors in
estimating age distributions, in Table 1 we show the
errors in the reddening parameters. Figure 1 shows a
comparison between a test example and the predicted one. On
average, it takes 15 seconds to predict the parameters of a single
spectrum.
We presented in this work an optimization algorithm that can
estimate with high accuracy age distributions and reddening of
stellar population in galaxies. The algorithm achieves convergence by
iteratively creating new data points that lie in the vicinity of
the query point. One important feature of this method is its high
speed, it takes 15 seconds to estimate the parameters of a single
spectrum. This represents a great advantage over other more
conventional methods proposed for this problem, which may take
several hours to find the solution for a single spectrum.
References
Bressan A. & Chiosi C. & Fagotto F. 1994, ApJS 94,63
Fuentes O. & Solorio T. 2003, AIA2003, Spain
Kurucz R. L. 1993, CDROM13: Atlas9, SAO, Harvard, Cambridge
© Copyright 2004 Astronomical Society of the Pacific, 390 Ashton Avenue, San Francisco, California 94112, USA
Next: Boxiness estimation method with fourth order moments
Up: VO Resources
Previous: The Chandra Bibliography Database
Table of Contents 
Subject Index 
Author Index 
Search 
PS reprint 
PDF reprint