LOCI IMAGE GENERATOR AND THE EVOLUTION OF...

90
UNIVERSIDADE FEDERAL DO ABC CENTRO DE CIÊNCIAS NATURAIS E HUMANAS PROGRAMA DE PÓS-GRADUCAÇÃO EM EVOLUÇÃO E DIVERSIDADE Max Mario Fuhlendorf LOCI IMAGE GENERATOR AND THE EVOLUTION OF TRYPANOSOMATID ECOTIN: CUSTOMIZED SOFTWARE AS A TOOL FOR EVOLUTIONARY ANALYSIS Santo André – SP 2018

Transcript of LOCI IMAGE GENERATOR AND THE EVOLUTION OF...

Page 1: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

UNIVERSIDADE FEDERAL DO ABC

CENTRO DE CIÊNCIAS NATURAIS E HUMANAS

PROGRAMA DE PÓS-GRADUCAÇÃO EM EVOLUÇÃO E DIVERSIDADE

Max Mario Fuhlendorf

LOCI IMAGE GENERATOR AND THE EVOLUTION OF TRYPANOSOMATID ECOTIN:

CUSTOMIZED SOFTWARE AS A TOOL FOR EVOLUTIONARY ANALYSIS

Santo André – SP

2018

Page 2: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

2

Page 3: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

3

Max M. Fuhlendorf

Loci image generator and the evolution of trypanosomatid ecotin:

customized software as a tool for evolutionary analysis

Dissertação apresentada ao Programa de Pós-graduação em Evolução e Diversidade da Universidade Federal do ABC como requisito à obtenção do título de Mestre. Linha de pesquisa: bioinformática e evolução gênica.

Orientador: Márcia Aparecida Sperança Coorientador: Arlei Marcili

Santo André – SP

2018

Page 4: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

4

Page 5: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

5

Page 6: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

6

Page 7: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

7

Page 8: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

8

Page 9: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

9

para Cleide, Max e Josepha

Page 10: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

10

Page 11: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

11

Agradecimentos

Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr,

profundamente. Eles foram muito além do que a maioria dos pais vai pelos filhos,

e não há agradecimentos suficientes que possam expressar meu carinho.

Minha orientadora, amiga e mestra Márcia Sperança me ensinou a pensar

como cientista sem nunca perder a empatia, e será sempre uma referência na

minha vida pessoal e profissional.

O corpo docente da UFABC é em sua maioria formado por pessoas

dinâmicas, absurdamente capazes e dispostas a sair de seus caminhos para

fomentar o aprendizado e a curiosidade dos seus alunos. Às dezenas deles que me

marcaram pessoal e profissionalmente, deixo aqui minha sincera gratidão.

Enfim, agradeço aos meus colegas de laboratório, amigos para noites

viradas de estudo, discussões existenciais e abraços quando precisei. Artur, Bianca,

Daubian, Douglas, Stephanie e Thiago: vocês tem lugar cativo na minha vida.

Page 12: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

12

Page 13: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

13

“ Slow though the process of selection may be, ... I can see no limit to the amount of change, to the beauty and infinite complexity of the coadaptations between all organic beings, one with another and with their physical conditions of life, which may be effected in the long course of time by nature's power of selection.

” DARWIN, Charles. 1859

All that you touch You Change.

All that you Change Changes you.

The only lasting truth Is Change.

God Is Change. And hidden within Change

Is surprise, delight, Confusion, pain, Discovery, loss,

Opportunity, and growth. As always, God exists

To shape And to be shaped.

” BUTLER, Octavia. 1993

Page 14: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

14

Page 15: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

15

Resumo

A filogenética molecular moderna vai além de simples análises de

sequências. O contexto genômico surge como uma ferramenta cada vez mais

relevante na inferência e compreensão de relações evolutivas e funcionais entre

genes. Aqui propomos um simples software online para identificação de loci de

interesse em sequências genômicas anotadas e geração de imagens prontas para

publicação. O programa foi testado com um caso único, uma análise filogenética

de uma enzima de tripanossomatídeos que foi transferida horizontalmente de

bactérias no ancestral comum do grupo, que não ocorre em outros eucariotos

extantes e é relevante para a virulência do parasita.

Palavras-chave: Genômica; Tripanossomatídeos; GenBank; Bioinformática.

Page 16: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

16

Page 17: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

17

Abstract

Modern molecular phylogenetic research goes beyond simply analyzing

sequence data. Genomic context is increasingly relevant as a tool to infer and

understand evolutionary and functional relationships between genes. Here we

propose a simple web-based software program for labelling loci of interest in

annotated genomic sequences and generating ready-for-publication imagery. The

program was tested with a unique use case, a phylogenetic analysis of a

trypanosomatid enzyme that was laterally transferred from bacteria in the group’s

common ancestor, occurs in no other extant eukaryotes and is very important to

the parasite’s virulence.

Keywords: Genomics; Trypanosomatids; GenBank; Bioinformatics.

Page 18: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

18

Page 19: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

19

List of acronyms

BLAST Basic Local Alignment Search Tool

CDS Coding sequence

CHP Conserved Hypothetical Protein

CSS Cascading Style Sheets

CSS3 Cascading Style Sheets (version 3)

FASTA FAST(ALL)

GenBank National Institutes of Health's genetic sequence database

HTML Hypertext Markup Language

HTML5 Hypertext Markup Language (version 5)

ISP Inhibitor of Serine Peptidase Enzymes

NCBI National Center for Biotechnology Information

PHP PHP: Hypertext Preprocessor

px pixel

RefSeq National Center for Biotechnology Information's Reference Sequence database

SQL Structured Query Language

tBLASTn Translated Nucleotide Basic Local Alignment Search Tool

UCSC University of California Santa Cruz

WAMP Windows Apache MySQL PHP

Page 20: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

20

Page 21: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

21

Summary

Initial remarks .....................................................................................23 Chapter I — Online database-driven loci image generator ..................25

Introduction ................................................................................... 27 Methodology .................................................................................. 29 Algorithm ....................................................................................... 31

input.php .................................................................................... 32 parser.php .................................................................................. 32 insert.php ................................................................................... 33 fasta.php .................................................................................... 34 image.php .................................................................................. 35

Results and sample images ............................................................ 39 Future improvements and functionalities ..................................... 57 Conclusion ...................................................................................... 59

Chapter II — Ecotin in Kinetoplastida ..................................................61 Introduction ................................................................................... 63

General aspects of the Trypanosomatida order ........................ 63 Serine Protease Inhibitors: ecotins and Trypanosomatida ISPs . 65

Methodology .................................................................................. 69 Results and discussion ................................................................... 71

Final remarks .......................................................................................81 References ...........................................................................................83 Glossary ...............................................................................................89

Page 22: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

22

Page 23: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

23

Initial remarks

Trypanosomatids are wildly successful agents of neglected tropical diseases.

They are neglected in part because of their occurrence, concentrated in

underdeveloped countries, and in part because they are very complex and

specialized organisms that are anything but easy to study.

When it was discovered in the early 2000’s that they carry genes for a

protein found in no other eukaryotes, a protein that is essential for the virulence

of a large number of bacterial species, a small group of researchers started looking

into this enzyme, ecotin, a wide-specificity serine protease inhibitor.

Here we try to improve the knowledge in this area with a novel algorithmic

approach. A software tool developed in-house as a tool to help solving the

phylogeny of a group of trypanosomatid ecotins took on a life of its own to become

a multi-purpose loci analysis tool. Using this new methodology and traditional

phylogenetic tools, we aim to shed a little light on the evolution and functional

importance of ecotins in trypanosomatids.

Page 24: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

24

Page 25: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

25

Chapter I — Online database-driven loci image

generator

Page 26: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

26

Page 27: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

27

Introduction

When studying the evolution of proteins and other genomic features it is

very important to consider not only the specific nucleotide sequences, but also

their relative positions to one another in the chromosomes of living beings. Spatial

relationships can often provide clues to shared evolutionary history, as well as

being indicative of functional relationships between genes and regulatory

sequences of the genome working in tandem.

For this reason, since the early 2000s, a variety of software has been written

with the express purpose of allowing easy visualization and editing of genomic

data, usually organized as stacked linear sequences in an interactive web-based

interface. These so-called genome browsers are extremely useful tools for a variety

of tasks that go well beyond simply visualizing gene loci (SKINNER et al., 2009; STEIN

et al., 2002). In a way, though, most genome browsers’ strength in flexibility is also

a hindrance in some specific use cases. When one's goal is simply to display and

compare specific gene loci between different species, the available genome

browsers can become very cumbersome to use. Some are specialized, useful only

in some contexts such as human or Drosophila genome browsers, others are really

complex and difficult to use for people without specialized training. Among these

challenges, one factor stands out: with most available software, comparing loci

from different species is not trivial, requiring a robust knowledge of the genome

browser’s settings and configuration procedures. The most used programs, like the

UCSC Genome Browser [https://genome.ucsc.edu/cgi-bin/hgGateway], are heavily

dependent on extensive databases of carefully annotated genomes, which make

them good tools for research on the most studied model species (CASPER et al.,

2018; UCSC GENOME BROWSER GATEWAY, [s.d.]).

For smaller-scale studies in which gene loci are important, especially if the

species in question are less commonly studied, these tools are lacking. In some

Page 28: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

28

cases, when the species being studied don't have extensive and manually

annotated genomes available, the more widely used genome browsers can

become almost useless. It is not uncommon for researchers to manually sketch loci

in a piece of paper, using automatically annotated genomes as a reference. When

their research is concluded, those sketches need to be painstakingly transformed

into informative image files for publication, arduous repetitive work that could be

easily automatized.

Here we propose a simple web-based software program that can organize

GenBank formatted genomic data, offering database-driven search of coding

sequences identified by user provided labels. In its current incarnation, the main

function of the software is the creation of beautiful graphical representations of

target genomic loci, easily customizable and ready for publication. It is also able to

create customized amino acid FASTA files with custom labels, ready for alignment

and analysis. The software can store very large amounts of data in a standard

MySQL database, from which multiple views can be very easily created. The use of

a PHP/Javascript implementation for the server/client development guarantees

flexibility and scalability for future uses. Even in its current beta stage, the software

can be useful for researchers working with genomic loci in non-model organisms,

providing a very easy to use tool for visualization, data storage and retrieval.

Usability is key: the user only needs to know how to copy and paste text, and how

to input data into browser forms.

In the pages below, we present a brief description of the software creation

process, its algorithm and future planned improvements for the release version.

We also present a sample project created to analyze the occurrence of ecotin-like

serinopeptidases inhibitors in various species of Leishmania and Trypanosoma

parasites. The current version of this software, though functional, is still in beta-

testing and further interface improvements are ongoing before its general

deployment.

Page 29: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

29

Methodology

Once the lack of proper tools for creating comparative genomic loci

diagrams was identified, the software was conceived as an in-house solution at the

Pathogenic Agents Laboratory at the Federal University of ABC (UFABC). The entire

project was created based on reliable and widely used modern internet

technologies, all either open-source or free-to-use so as to guarantee future-

proofing and flexibility: the code can be reused and modified for different uses and

by other programmers.

Web page structure and design was created using basic HTML5 and CSS,

taking advantage of the new canvas element that permits on-the-fly image creation

(CASCADING STYLE SHEETS, [s.d.], HTML 5.1 2ND EDITION, [s.d.]; DUCKETT, 2014).

Server-side programming was written in the PHP interpreted language, using

version 5 to maintain compatibility with older web hosting services (PHP:

HYPERTEXT PREPROCESSOR, [s.d.]; WELLING; THOMSON, 2016). Client-side

scripting was done in Javascript, mostly to manipulate the canvas elements used

for image creation (DUCKETT, 2014; JAVASCRIPT W3SCHOOLS, [s.d.]; WELLING;

THOMSON, 2016). All data was stored and manipulated in MySQL, a simple and

very efficient open source relational database implementation using the industry-

standard SQL database manipulation language (BEAULIEU, 2009; MYSQL, [s.d.], SQL

W3SCHOOLS, [s.d.]; WELLING; THOMSON, 2016).

The code for all the languages above was edited using Notepad++, an open-

source text editor capable of formatting, annotating, validating and doing code-

completion for a variety of programming and design languages, including all those

cited above (NOTEPAD++, [s.d.]). Testing and development was done offline, with

the WampServer all-in-one web development platform that includes an Apache

web server, the PHP interpreter and MySQL database in a single installable file for

Windows operating systems; the acronym Wamp stands for

Page 30: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

30

WindowsApacheMySQLPHP (PHILLIPS, 2012; WAMPSERVER, [s.d.]). Online testing

was done manually transferring the code files to a personal Yahoo Small Business

Web Hosting account that supports PHP and MySQL (YAHOO SMALL BUSINESS WEB

HOSTING, [s.d.]). Both online and offline testing was done on a variety of modern

web browsers compatible with HTML5, CSS3 and Javascript, with a heavier focus

on the Google Chrome browser because it is the market leader by a large margin,

being the browser of choice for almost two thirds of internet users as of May 2018

(STATCOUNTER GLOBAL STATS - BROWSER USAGE SHARE, [s.d.]).

The software is currently working with no apparent bugs, and is going

through the beta testing stage, in which it is tested by a select group of target users

before full deployment (FINE, 2002). It is important to note that all references given

above in this section are only a small parcel of the myriad sources consulted both

on and offline, and should not be considered definitive in any sense. Modern web

software development is a very dynamic enterprise: the platforms and languages

are in a constant state of flux with new features and bug fixes being introduced

daily, and programmers need to consult a wide variety of internet sources including

tutorials, discussion forums, language reference manuals and others in an almost

hourly basis.

Page 31: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

31

Algorithm

The algorithm is spread across a number of different files, as is customary

in web-based PHP applications. Each file behaves like a web page and performs a

specific task, such as data acquisition, data processing, data retrieval and image

generation. In the final version, each of these functional web pages will be

organized and accessible from a single master page that will serve as the main

interface. In the current beta form, the software works in a more sequential

manner: the user pastes the GenBank data in the input.php page; this data is then

treated, formatted and prepared for database storage in the parser.php page,

where the user is also asked to label all the CDS features (coding sequences) of

interest; the insert.php page performs the actual database creation if necessary,

further data checks for consistency, and gives positive confirmation for the user

that the data insertion was successful; the fasta.php simply creates an amino acid

FASTA-formatted file from labelled sequences from different species, useful for

alignment and further evolutionary studies; and, finally, the image.php actually

creates the visual genomic loci representation from a list of protein labels provided

by the user, with some customization options available such as a choice of full

chromosome view or a view zoomed into the area of interest contained between

labelled proteins, and a choice of image size in pixels.

An additional file called functions.php is not directly accessible by the user,

as instead of performing a specific task, it contains the procedural functions for all

repetitive tasks used in the other pages, such as database connections, treatment

of warnings and error messages for the user, etc., and will not be explicitly detailed

below. The following sections explain these algorithms in detail, and the entire beta

source code of each of these pages is available as annexes at the end of this

document.

Page 32: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

32 input.php

This page is very simple, containing only a very large input text box and a

submit button. The user only has to paste a complete GenBank record into the text

box and click on the submit button. No programming is employed at this stage, only

HTML and CSS for data input and page layout.

In the final version of the software this page will also include a login system

and a project selection/creation option, to allow for multiple users with more than

one project each.

parser.php

In this page most of the data manipulation and validation occurs. GenBank

records are very precisely structured (SAMPLE GENBANK RECORD, [s.d.]), a great

feature for computational parsing, but since genomes are such complicated

entities the number of required data validations is big. These are the steps done at

this stage:

1. The GenBank data input by the user is first split into an array of text

lines, with each line of the original text record assigned to an array

item;

2. The algorithm analyzes each of the array’s entries to determine its

nature, using regular expression text validations. In this first pass two

main areas are identified: the header containing sequence,

organism, chromosome and other assembly data, and the much

larger features section with all the annotated genomic features;

3. The header area of the array is looped through first with crucial

information being extracted, such as organism name, organism

definition and RefSeq id, essential as a unique sequence identifier

(PRUITT; BROWN; MURPHY, 2018);

Page 33: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

33

4. The next step is another loop over the last part of the array, this time

over all the lines containing genomic features as identified in step 2.

Each complete feature is stored as a new text item in a newly created

features array;

5. The features array is looped through, and each item is again

processed using regular expressions to create a structured

representation of the feature in question to replace the text version.

Each feature can have a number of qualifiers, which are details such

as product name and translation for mRNA and CDS features. Since

the list of possible qualifiers for all different types of features is very

large, a separate array is created during this step containing unique

qualifier labels, as this data will be crucial for database creation and

data insertion in the next page;

6. After data processing is complete, the algorithm prints a complete

list of GenBank features in a HTML table with a corresponding text

field attached to the details of each feature, and a visual

representation of its relative position in the nucleotide sequence. In

the text field the user should label the features of interest, as those

will be used by the loci visualization tool later in the page image.php;

7. When all labels are typed by the user, they click on the submit button

and the page submits the header data, the features array and the

qualifiers array for database inclusion, which happens in the next

page, insert.php.

Future versions of this page will have mostly cosmetic refinements to make

it more user friendly.

insert.php

This page requires almost no user interaction. It will validate data provided

by the parser.php algorithm, verify database status and create tables and columns

Page 34: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

34

as needed, check for duplicate data and alert the user, and finally insert the data

into the database if no errors are found. These are the steps done at this stage:

1. First the algorithm checks if the database tables have already been

created, creating them if needed: the main table to contain sequence

header data and the features table to contain all genomic features

from the sequence, as well as the user provided labels.

2. Using the list of qualifiers created in step 5 of the parser.php

execution, the algorithm checks for any missing columns in the

features table and creates them;

3. Then the algorithm inserts the header with the sequence details

obtained in the last page into the main table, including RefSeq id; if a

duplication is detected, the algorithm fails with an error message to

the user indicating the reason: no duplicate RefSeq ids are allowed;

4. If the insertion of header data occurs correctly, the structured list of

features is composed into a large single insert SQL statement which

is then passed to the database; if any errors occur the insert is

cancelled and the user alerted;

5. If no errors are detected in step 4, the user is informed that the

database insert was successful.

Future versions of this page will have cosmetic refinements to make it more

user friendly, and probably an option to modify or delete previously inserted data

if the user tries to insert duplicates.

fasta.php

This algorithm is very simple: it does only a database query for all features

with a user-specified label, followed by printing the results as text in the FASTA file

format ready for analysis in other programs.

Page 35: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

35

The user can specify the desired label adding ?label=xxxx to the HTTP

request string, where xxxx is the label string. For example, to get a FASTA file with

all CDSs labelled ‘CHP1’ one simply writes ‘fasta.php?label=CHP1’ in the browser

address bar.

image.php

This page contains the most important part of the software as initially

designed: the creation of image files clearly representing loci of labeled sequences

in a number of related species for visual inspection and publication. Up to 18

feature labels can be displayed simultaneously, using highly contrasting colors for

differentiation (TRUBETSKOY, 2017). Its rendered output can be adjusted by a few

parameters given as variables in the HTTP request string, adding a ? character after

image.php in the address bar, followed by these optional variables:

● &width=####px — where #### is any number larger than 500

This will set the image width in pixels, and affects the

resolution and quality of the final output, with larger number

resulting in better looking images which are also larger. The user can

test different sizes to determine the ideal resolution for their use

case. If this option is absent or set with a value smaller than 500px,

the algorithm defaults to 1900px, a number which is sufficient for

printing the resulting images in portrait orientation with over 300dpi

of resolution, the minimum standard required by most publications.

● &zoom=# — where # can be either 0 or 1

This will determine whether the algorithm shows the loci in

relation to the entire sequence for each GenBank entry in the

database, the standard behavior when this option is not set or set to

0. When this option is set to 1, each GenBank sequence will be

truncated between the first and last occurrences of the features

custom labelled by the user at step 6 of parser.php, restricted by the

Page 36: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

36

list provided by the &cds option explained below. This is especially

useful in cases where a GenBank record is very large and the locus of

interest is small, as it effectively provides a simple zooming ability.

● &feat=xxx,xxx,xxx — where xxx are the names of features of interest

This is the list of feature labels shown by the image, comma-

separated. It can be no longer than 18 labels; any extra labels will be

ignored due to design constraints. The user may have tagged tens or

hundreds of features with many different labels when inserting the

GenBank record, and showing all the distinct labels at once may not

be useful or even possible inside one image. If the user wishes to

compare different loci in different images, here is where it can be

done, by providing the algorithm with the specific list of labels to be

used for each image, and generating different images. If the user

does not use this optional parameter, the algorithm retrieves the first

18 distinct labels from the database.

These are the steps done for image creation:

1. All the different formatting information has to be set up first: using

the pixel width obtained from the user via the optional &width

parameter (or the standard 1900px value if there was no user input),

the program determines all the other relative values, such as font

size, image canvas height, indicator triangles height, etc. It also

verifies if the user has passed optional parameters &feat, &width or

&zoom and validates them;

2. Then the algorithm retrieves all GenBank records from the database

that contain the labels provided by the user with the optional &feat

parameter, or all records if no parameter was passed;

3. If the user passed the optional parameter &zoom=1, the program

finds the lower and upper bounds in each GenBank record

Page 37: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

37

corresponding to the first and last occurrences of labelled features in

the sequence; if no &zoom parameter was passed, the upper and

lower drawing bounds are set to 0 and to the GenBank record size in

base-pairs, respectively;

4. With all necessary data prepared, the program creates the HTML

canvas element and iterates through all the feature labels (set by the

user or retrieved from the database), creating color-coded

references at the top of the image for each label. 18 contrasting

colors are preset;

5. At the top left corner of the image a scale bar is printed, an essential

visual feature for comparing GenBank records of different sizes. The

scale is calculated to increase in increments of 1 kbp from 1 to 9kbp

and in increments of 10 kbp for values larger than 10 kbp;

6. The printing of the actual genomic data is done in several sub-steps:

a. First, each GenBank record’s species name is printed on the

left panel, with the genus name abbreviated, in bold italics. In

a smaller font the RefSeq id of the sequence is printed below

the species name, ensuring a very compact and easy to read

label that nonetheless is uniquely identified by the RefSeq id;

b. Then the unlabeled features are printed to the right of the

species’ names, coding sequences (CDS) in dark gray and

other types of features in light gray;

c. The labelled features are then printed in color according to

the legend printed at step 4, with slightly taller bars and

triangular indicators on the top or bottom: triangles are

placed at the top of bottom in an alternating fashion, avoiding

excessive clumping in very large sequences with closely

labelled features;

Page 38: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

38

d. Finally, a small rotated number is printed at the start and end

of each feature sequence, indicating the exact base-pair

position of start and end in the original GenBank record.

This algorithm is very close to its final stage, pending only suggestions from

the beta testers and a few refinements to the zoom mechanism.

Page 39: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

39

Results and sample images

We present below several examples of generated loci images from

image.php, with comments on specific use cases and usage. The output from

fasta.php are simple standard FASTA-files with no new features, so it suffices to say

that the algorithm is producing the expected output and no samples are shown in

these results.

Three species with fully annotated genomes were used for this

demonstration, Leishmania major [RefSeq NC_007256.2], L. mexicana [RefSeq

NC_018319.1], e L. donovani [RefSeq NC_018242.1] (DOWNING et al., 2011;

ROGERS et al., 2011). Four classes of proteins were selected and labelled using the

software as described in the previous section: ecotin-like serinopeptidases

inhibitors labelled “ecotin”, a “katanin-like” microtubule-severing protein, a

ribonucleoprotein labelled “ribonucleoproteinP18” and a membrane associated

protein, which was purposefully labelled with the very long name “tb-292

membrane associated protein-like protein” for display purposes. All images below

were generated from the same dataset.

The first image, fig.1 on the next page, is an example of an export done

without any parameters given by the user, with the default values: no zoom,

1900px image and all the feature labels displayed. Dark grey bars represent coding

sequences (CDS), light gray bars represent non-CDS features, and colored bars are

labelled CDSs. Note how the algorithm gracefully handles excessively long feature

labels by compressing them laterally.

Page 40: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

40

Figu

re 1

. Sam

ple

loci

visu

aliz

atio

n ge

nera

ted

with

out a

ny p

aram

eter

s pa

ssed

Page 41: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

41

Figures 2 and 3 represent the exact same data (with full-sequence and

zoomed-in versions displayed together), but with the &width parameter passed

with values 550px and 4400px respectively, to demonstrate the lack of correlation

between fonts and most other image components’ sizes to the provided pixel

width: they are stable because they are all set as proportions of the user-provided

pixel width. The overblown borders in colored components vary in size more

abruptly because of rounding errors in their border widths, and this is not currently

fixable due to limitations in the HTML canvas element used for rendering. The

difference in quality due to resolution is quite visible, making clear that sub-1000px

resolutions should not be used except in extremely simple genomes:

Page 42: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

42

Figu

re 2

. Sam

ple

loci

visu

aliz

atio

n - 5

50px

wid

e

Page 43: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

43

Figu

re 3

. Sam

ple

loci

visu

aliz

atio

n - 4

400p

x w

ide

Page 44: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

44 Figures 4 to 7 represent the same data, but with the user specified &feat

parameter selecting a single label in each image pair (all shown with and without

the &zoom parameter). Note how the first color is always red, for maximum

contrast:

Page 45: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

45

Figu

re 4

. Sam

ple

loci

visu

aliz

atio

n - “

ecot

in”

labe

l onl

y

Page 46: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

46 It is important to note that for coding sequences appearing only once in

each Genbank record, such as the katanin-like protein shown in fig. 5, it makes no

sense to use the zoomed-in view as it will simply show a large bar for each species

and the scale bar will be compressed into a single vertical bar.

Page 47: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

47

Figu

re 5

. Sam

ple

loci

visu

aliz

atio

n - “

kata

nin-

like”

labe

l onl

y

Page 48: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

48 Likewise, for very closely spaced features like those in fig. 6, the zoomed-in

view might be too restrictive in this version of the software, as the zoom level is

not user-controllable.

Page 49: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

49

Figu

re 6

. Sam

ple

loci

visu

aliz

atio

n - “

ribon

ucle

opro

tein

P18”

labe

l onl

y

Page 50: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

50 Fig. 7 further illustrates that only GenBank sequences that have labelled

feature will be included in the image, so as to avoid duplications: only L. mexicana

has a feature with the label “tb-292 membrane associated protein-like protein”, so

it’s the only GenBank record shown.

Page 51: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

51

Figu

re 7

. Sam

ple

loci

visu

aliz

atio

n - “

tb-2

92 m

embr

ane

asso

ciat

ed p

rote

in-li

ke p

rote

in”

labe

l onl

y

Page 52: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

52 Figures 8, 9 and 10 represent combinations of 2 and 3 different labels to

illustrate the potential variety of visual representations that can be generated from

a single dataset, once it is imported and labelled into the system. They also make

clear that the legend colors depend only on the label order, so the first label in the

user passed list will be red, the second one will be green, the third yellow, the

fourth blue and so on.

Page 53: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

53

Figu

re 8

. Sam

ple

loci

visu

aliz

atio

n - “

tb-2

92 m

embr

ane

asso

ciat

ed p

rote

in-li

ke p

rote

in”

and

“eco

tin”

labe

ls

Page 54: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

54

Figu

re 9

. Sam

ple

loci

visu

aliz

atio

n - “

ecot

in”,

“tb

-292

mem

bran

e as

soci

ated

pro

tein

-like

pro

tein

” an

d “r

ibon

ucle

opro

tein

P18”

labe

ls

Page 55: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

55

Figure 10 is a useful illustration of the care needed when using the zoomed-

in view: since one of the labels only occurs once in L. major, it is shown with no

genomic context.

Page 56: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

56

Figu

re 1

0. S

ampl

e lo

ci v

isual

izat

ion

- “tb

-292

mem

bran

e as

soci

ated

pro

tein

-like

pro

tein

” an

d “r

ibon

ucle

opro

tein

P18”

labe

ls

Page 57: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

57

Future improvements and functionalities

Although very useful as it is, this software has ample room for

improvements and creation of new functionalities. Bellow we present a short list

of some of the planned upgrades in the near future.

● Writing a module for labeling and modifying labels of GenBank

sequences already imported into the database, allowing users to first

ingest a large amount of data and then work on labeling the areas of

interest inside the software itself;

● Writing a module that allows labels to be sorted into classes,

facilitating multiple analysis of the same data;

● Making various alterations to the image.php module, to make it

interactive, with real-time zooming and the addition of feature detail

mouse-overs: when the user moves the mouse over the features in

the image, a pop-up pane will display the details of the feature.

● Integration with various online APIs (Application Programming

Interfaces, tools that facilitate data exchange between different

systems), allowing access to tools such as sequence alignment and

Blast searches.

Page 58: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

58

Page 59: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

59

Conclusion

The original problem that motivated this project was successfully solved: the

loci viewer is working exactly as planned, and it is proving very useful for the

intended use case. It is still in beta and with an incomplete user interface, but as

these issues are being resolved, it can grow to become a more mature package,

with multiple modules feeding off the same database. The use of free and open-

source components guarantees that it has the flexibility to be altered and

expanded not only by its original designer, but by anyone else with programming

skills.

This still unnamed software project is a good example of the kind

interdisciplinary effort that should make up the bulk of modern science but

unfortunately still doesn’t. Computer software is absolutely essential for modern

genomics but very few biologists are well trained in their inner workings. Software

that works as a black box is dangerous in science: if the user does not understand

the algorithmic process well, he or she can get beautiful results that are completely

disconnected from reality. Small in-house projects like these are a way to both get

programming experience and foster interdisciplinarity in biology labs.

Page 60: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

60

Page 61: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

61

Chapter II — Ecotin in Kinetoplastida

Page 62: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

62

Page 63: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

63

Introduction

General aspects of the Trypanosomatida order

Kinetoplastida are a class of unicellular eukaryotes characterized by the

presence of the kinetoplast, a feature formed by circular concatenated DNA

molecules (kDNA) inside a solitary mitochondrion. Trypanosomatida is an order of

kinetoplastids composed of obligatory parasites of invertebrates, plants and all

classes of vertebrates (BORGHESAN et al., 2013; MARCILI, 2008). This order

includes genera that are exclusive arthropod parasites, such as Crithidia

(CAMARGO et al., 1992) and Herpetomonas (BORGHESAN et al., 2013), and also

genera with heteroxenous life cycles (involving more than one host), such as

Leishmania and Trypanosoma. These two genera are of great medical importance

as they contain species that are etiological agents for serious human diseases, such

as sleeping sickness, Chagas disease, visceral and cutaneous leishmaniasis. Some

Trypanosomatida genera have obligate endosymbiont bacteria, having been used

as model organisms in studies on unicellular symbiosis (ALVES et al., 2013; MOTTA

et al., 2010).

Species in the Leishmania genus are flagellated protozoa with heteroxenous

life cycles. Their life cycle has two main phases, each with morphological and

physiological characteristics adapted to life inside invertebrate and vertebrate

hosts. Promastigotes are flagellated and motile, adapted to extracellular survival in

the digestive tract of vector insects, while amastigotes are unable to move,

adapted to intracellular survival inside vertebrate macrophages (GRIMALDI; TESH,

1993; REY, 2008). Leishmania insect vectors belong to the order Diptera, the

common flies (REY, 2008).

Species in Trypanosoma genus, also heteroxenous, usually have more

complex life cycles (RAMOS et al., 2005). In trypanosomatids belonging to the

Stercoraria section, such as Trypanosoma cruzi, the metacyclic trypomastigote

Page 64: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

64

infective form penetrates the skin or skin lesion of the vertebrate host after being

expelled with the feces of the insect. In members of the Salivaria section, such as

Trypanosoma brucei, the metacyclic trypomastigote is injected directly from the

salivary gland of the insect host into the bloodstream of the vertebrate host. Once

inside the vertebrate host the infective metacyclic trypomastigotes transform into

bloodstream trypomastigotes which invade cells of various tissues and, in the case

of T. cruzi, transform into amastigotes, static forms that multiply inside the cells by

binary fission. After a number of division cycles, T. cruzi amastigotes transform into

bloodstream trypomastigotes and are released into the circulatory system,

infecting other cells in the body. T brucei trypomastigotes can directly multiply by

binary fission. When a hematophagous insect vector feeds on the blood of an

infected vertebrate it consumes bloodstream trypomastigotes forms, which

differentiate inside the insect into procyclic trypomastigotes (in some species) and

then into epimastigotes, which are capable of multiplying by binary fission. These

life cycle stages of the parasites populate the posterior digestive tract (in section

Stercoraria) or the salivary gland (in section Salivaria) of the host. The cycle is

closed with the transformation of epimastigote forms into metacyclic

trypomastigote forms that infect a new vertebrate host (HOARE, 1972; REY, 2008).

Species in this genus can have different arthropod vectors: Chagas disease agent

T. cruzi and all other stercorarian trypanosomes is transmitted by hemipteran

insects in the Reduviidae family, while sleeping sickness agent T. brucei and all

other salivarians are transmitted by dipteran vectors (REY, 2008).

The phylogeny of trypanosomatids is constructed based mostly on

molecular studies, because the external morphology of these unicellular organisms

is often indistinguishable between species (STEVENS et al., 1999; STEVENS; NOYES;

GIBSON, 1998). The most common molecular marker used for barcoding

trypanosomatids is the V7V8 region of the 18S rDNA, being widely used in the

description of inter and intraspecific relations in this clade (CORTEZ et al., 2006; DA

Page 65: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

65

SILVA et al., 2004; FERREIRA et al., 2007; LIMA et al., 2012; MARCILI et al., 2009;

RODRIGUES et al., 2006; TEIXEIRA et al., 2011; VIOLA et al., 2009). Auto jeans used

in trypanosomatid evolutionary studies are the gGAPDH gene for the

glyceraldehyde-3-phosphate dehydrogenase protein (HAMILTON et al., 2004;

HAMILTON; GIBSON; STEVENS, 2007) and the CATL gene encoding the cathepsin

enzyme (PÉREZ, 2012).

Recent studies indicate that the heteroxenous life cycle in trypanosomatids

may have an evolutionary history beginning before the start of the Tertiary period,

but the overall Kinetoplastida phylogeny is still filled with uncertainty, even more

so among trypanosomatids (VOTÝPKA et al., 10/2015). The existence of bacterial

endosymbionts in the group is of marked interest to researchers, possibly being

related to the transition from free to parasitic life cycles or being involved in lateral

gene transfers between bacteria and eukaryotes (ALVES et al., 2013).

Trypanosomatids have polycistronic DNA transcription which tends to keep coding

sequences conserved in contiguous groups, resulting in multiple gene loci being

preserved between different species (JACKSON, 02/2015).

Serine Protease Inhibitors: ecotins and Trypanosomatida ISPs

Ecotins are serine protease inhibitors initially described in E. coli bacteria

and named for its capacity to inhibit the digestive enzyme trypsin — E. coli trypsin

inhibitor (CHUNG et al., 1983). The E. coli ecotin has a molecular weight of 18 kDa

and is expressed in the cellular periplasm with an homodimeric active form. It

inhibits serine proteases of family S1A including trypsin, chymotrypsin, neutrophil

elastase and cathepsin G (CHUNG et al., 1983; MCGRATH; GILLMOR; FLETTERICK,

1995; YANG et al., 1998). Ecotin activity protect cells against exogenous serine

peptidases involved in various biological processes, including coagulation and

fibrinolysis; this capacity for inhibiting a considerable number of different proteins

differentiates ecotin from most other serine protease inhibitors, which generally

are highly specific (EGGERS et al., 5/2001, 2004; LAUWEREYS; LAMBEIR, 1996).

Page 66: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

66 Trypanosomatids are the only eukaryotes with genes coding for ecotin

analogs, described for the first time in 2005 by Ivens et al. in L. major, with three

variants that were named ISP1, ISP2 and ISP3 (IVENS et al., 2005). In L. major the

ISP1 and ISP2 ecotins have 16.5 and 17.5 kDa respectively, and while structurally

similar to the E. coli ecotin, their amino acid sequence identity is only 36%

(MORRISON et al., 2012), and they have different patterns of expression and

inhibitory activity in the various stages of the parasite life cycle (ESCHENLAUER et

al., 2009). In L. major the ISP1 variant is expressed in larger quantities in the life

cycle forms living in the insect host, and knockout studies with this gene suggest

that it has endogenous functions, mainly in the flagellar formation process

(MORRISON et al., 2012). Also in L. major, ISP2 expression occurs in all life cycle

stages and there are evidences that this enzyme participates in the parasite

macrophage infection process in hosts, by inhibiting serine proteases such as

neutrophil elastase in vertebrates; there is evidence that Leishmania parasites with

knocked down ISP2 suffer more intense phagocytosis by host macrophages

(ESCHENLAUER et al., 2009; FARIA et al., 2011; RIBEIRO-GOMES et al., 2007). E. coli

and other bacteria that have periplasmic ecotin use it to evade hosts’ immune

systems, and L. major employs its ISP2 inhibitor in a similar fashion (EGGERS et al.,

2004; IRELAND et al., 02/2014). The lack of genes coding for ecotin target enzymes

(the S1A family of serine proteases) in both E. coli and L. major is a strong indicator

of the probable role of ecotins in these species’ interactions with vertebrate hosts

(ESCHENLAUER et al., 2009).

An ISP2 homolog has been found in T. cruzi with a high degree of sequence

similarity to the L. major gene (EL-SAYED et al., 2005). BLAST searches in the NCBI

GenBank database reveal that other members of the genus Trypanosoma also

possess ISP2 homologues, as well as close relatives in the order Trypanosomatida

such as Leptomonas spp., most papers published on the subject have focused on

Leishmania ISPs. It is probable that, due to both its conservation in various species

Page 67: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

67

of trypanosomatids and to its flexible functional properties, ecotin homologs have

offered some fitness gain to trypanosomatids with vertebrate hosts. Also, the

conservation of ISPs in various species indicates an origin in the common ancestor

of Trypanosomatida. The similarity between trypanosomatid ISPs and bacterial

ecotins make us raise the hypothesis of a lateral gene transfer between E. coli like

bacteria and the common ancestor of the various Trypanosomatida genera as the

origin of ISPs (ESCHENLAUER et al., 2009; OPPERDOES; MICHELS, 2007). Recent

research suggest that this kind of lateral gene transfers has been essential in this

group’s evolutionary history (ALVES et al., 2013). The bacterial endosymbionts in

Kinetoplastida are in class Betaproteobacteria (ANDRADE et al., 2011); This group

contains vertebrate infecting species that not only possess ecotin encoding genes,

but that depend on those ecotins being expressed to maintain their virulence

(IRELAND et al., 02/2014), which may be another hint of ancestral lateral gene

transfers between Betaproteobacteria and Trypanosomatida.

Leishmania and Trypanosoma parasites are responsible for a number of

severe neglected tropical diseases, as officially listed by the World Health

Organization (MOLYNEUX, 2013; NEGLECTED TROPICAL DISEASES, 2015). Multiple

sources indicate that ecotin and its homologs are connected to these parasites’

infective capacity, but research in this specific subject is still timid, especially in the

Trypanosoma genus: neglected tropical diseases like Chagas disease (caused by T.

cruzi) and African sleeping sickness (caused by T. brucei) are neglected for socio-

historical reasons, as these afflictive diseases rarely, if ever, occur in developed

countries.

Studying the evolution of the genes encoding for trypanosomatid ISPs can

shed light not only on the group’s evolutionary history, but also on the overall

importance of this enzyme for future researchers. In the next few pages we show

evidence for a common ancestry of ISPs in extant trypanosomatids using both

phylogenetic inferences and a novel method for gene loci analysis.

Page 68: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

68

Page 69: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

69

Methodology

Amino acid sequences of ISPs were obtained from the NCBI database

(National Center for Biotechnology Information, U.S. National Library of Medicine)

using their BLAST search package using E. coli ecotin as a BLAST target (BORATYN

et al., 2013). Sequence alignment using the MUSCLE algorithm and phylogenetic

maximum likelihood analysis was done in SeaView (GOUY; GUINDON; GASCUEL,

2010), with selection of best-fit amino acid substitution matrices done with

PROTTEST 3 (DARRIBA et al., 2011). Resulting tree files were manually edited to

standardize terminal labels, and cladogram image files were exported using the iTol

web tree tool (LETUNIC; BORK, 2011).

For the loci viewer project database input, complete genomic sequences for

loci analysis were obtained from the same database using tBLASTn, a tool that

takes amino acid sequences as input and searches for corresponding nucleotide

sequences, with the search limited to RefSeq annotated genomes (BORATYN et al.,

2013). Data input and manipulation was done as described in the algorithm section

in Chapter I. The complete list of sequences with respective web links and NCBI

GenBank IDs is listed in Table I. After data input was completed, the database was

manually manipulated using SQL queries to identify close genes to the left of right

of the ecotin homologs in various species, aided by the visual map generated

showing that ISPs occurred in two different loci around fifty kbp apart in most

species. One of these genes is a putative katanin encoding gene, and the other

three are conserved hypothetical protein-encoding genes that were called CHP1, 2

and 3. Using the L. braziliensis genome as a reference, these four amino acid

sequences were run through the NCBI tBLASTn tool using the same settings used

for the ecotin homologs, and resulting CDSs were manually labelled CHP1, CHP2,

CHP3 and katanin-like in the loci viewer database.

Page 70: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

70

Table 1 - List of RefSeq genomic records inserted into the loci image generator database

Leishmania braziliensis MHOM/BR/75/M2904 NC_009307.2 https://www.ncbi.nlm.nih.gov/nuccore/NC_009307.2

Leishmania braziliensis MHOM/BR/75/M2904 NC_018242.1 https://www.ncbi.nlm.nih.gov/nuccore/NC_018242.1

Leishmania infantum JPCM5 NC_009399.2 https://www.ncbi.nlm.nih.gov/nuccore/NC_009399.2

Leishmania major strain Friedlin NC_007256.2 https://www.ncbi.nlm.nih.gov/nuccore/NC_007256.2

Leishmania mexicana MHOM/GT/2001/U1103 NC_018319.1 https://www.ncbi.nlm.nih.gov/nuccore/NC_018319.1

Leishmania panamensis NC_025860.1 https://www.ncbi.nlm.nih.gov/nuccore/NC_025860.1

Leptomonas pyrrhocoris NW_015438382.1 https://www.ncbi.nlm.nih.gov/nuccore/NW_015438382.1

Leptomonas pyrrhocoris NW_015438394.1 https://www.ncbi.nlm.nih.gov/nuccore/NW_015438394.1

Trypanosoma brucei brucei TREU927 NC_007278.1 https://www.ncbi.nlm.nih.gov/nuccore/NC_007278.1

Trypanosoma brucei gambiense DAL972 NC_026738.1 https://www.ncbi.nlm.nih.gov/nuccore/NC_026738.1

Trypanosoma cruzi strain CL Brener NW_001849489.1 https://www.ncbi.nlm.nih.gov/nuccore/NW_001849489.1

Trypanosoma grayi NW_008825978.1 https://www.ncbi.nlm.nih.gov/nuccore/NW_008825978.1

Trypanosoma grayi NW_008826261.1 https://www.ncbi.nlm.nih.gov/nuccore/NW_008826261.1

Although the main focus of this research is pre-existing data analysis, we

also expressed an ISP2 ecotin homolog from T. cruzi to test its in-vitro activity

against a range of serine proteases. Enzymatic activity tests are still ongoing. For

this part of the project, oriented by Prof. Márcia Aparecida Sperança PhD., the ISP2

coding gene of T. cruzi strain Y was extracted by PCR from purified DNA with

custom-designed primers. It was cloned and expressed at the Federal University of

ABC’s Pathogenic Agents Laboratory using standard recombinant expression

techniques from Sambrook and Russel (2001) (SAMBROOK; RUSSELL, 2001).

Page 71: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

71

Results and discussion

Recombinant T. cruzi ISP2 was successfully expressed and purified and is

currently undergoing enzymatic reaction tests at Federal University of ABC’s

Pathogenic Agents Lab by a separate study group led by Prof. Sperança. Preliminary

results indicate that that enzyme inhibits serine proteases such as trypsin and

neutrophil elastase very strongly, at least as strongly as E. coli ecotins. Numeric

data is still unavailable as these essays are still in progress, but these preliminary

results seem to indicate that the T. cruzi ISP2 very probably acts similarly to

Leishmania spp. ISP2, protecting the parasite by inhibiting host macrophage action

(FARIA et al., 2011).

Maximum likelihood phylogenetic inference resulted in the tree

represented in fig. 11, with E. coli ecotin as the outgroup and color-coded ISPs 1, 2

and 3. The tree topology strongly indicates that the ISP ecotin homologs have

differentiated from each other a long time ago, and it is probable that at least ISP1

and ISP2 have been with these organism’s genome at least since the split between

the Leishmania and Trypanosoma genera.

Page 72: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

72

Figu

re 1

1. M

axim

um li

kelih

ood

clad

ogra

m o

f eco

tin h

omol

ogs

(ISPs

), w

ith IS

P1 la

belle

d in

ora

nge,

ISP2

labe

lled

in

gree

n an

d IS

P3 la

belle

d in

blu

e

Page 73: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

73

Using the loci image generation software described in detail in chapter I

resulted in the images in figures 12 and 13, showing the full chromosomes and a

zoomed-in area of interest respectively. Figure 12 is only useful for comparing

Leishmania records as to overall position in the chromosome: ecotin homologs

occur only in chromosome 15 of these species, and the data for other species is

either incomplete or badly annotated, resulting in huge contig sequences. In figure

13 clears the image a little, but data for T. grayi and Leptomonas pyrrhocoris

sequences is still fragmentary.

Page 74: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

74

Figu

re 1

2. L

oci i

mag

e ge

nera

tor r

esul

t sho

win

g fu

ll G

enBa

nk re

cord

s an

d al

l sel

ecte

d la

bels

Page 75: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

75

The images show up to five ecotin homologs in various Leishmania species.

This duplication was not reported in previous papers. Sequence data analysis and

visual genomic context inspection give strong support to the idea that ecotin

homologs suffered various duplications and/or multiple events of lateral gene

transference before the differentiation of modern Trypanosomatida genera.

Closely examining fig. 13 it can be seen that Leishmania spp. mostly retained all

five ISP copies, while trypanosomes lost at least a few of them. To facilitate this

analysis, a second zoomed in image (fig. 14) was generated with manual database

manipulation, changing all flanking proteins labels to delim and identifying as well

as possible the ecotin homologs as ISP1, ISP2 and ISP3. In these figures, the CHP1-

3 and katanin-like labels are accessory labels: they serve to identify the complete

ecotin loci and were helpful to identify possible identification with ISP1, ISP2 and

ISP3.

Page 76: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

76

Figu

re 1

3. L

oci i

mag

e ge

nera

tor r

esul

t sho

win

g G

enBa

nk re

cord

s an

d al

l sel

ecte

d la

bels

zoom

ed in

to th

e ar

ea o

f int

eres

t

Page 77: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

77

We know ISPs came from bacteria by horizontal gene transfer in

Kinetoplastida because they appear in no other eukaryotes. The unanswered

question is how. Looking at these zoomed-in images and keeping the cladogram in

fig. 11 in mind, we can form a hypothesis for how trypanosomatids acquired ecotin

homologs. The next paragraph is speculative, but given the evidence it probably is

not too far off-base:

Page 78: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

78

Figu

re 1

4. L

oci i

mag

e ge

nera

tor r

esul

t cre

ated

with

the

sam

e da

ta a

s Fi

g. 1

3, b

ut w

ith in

divi

dual

ISPs

iden

tifie

d as

ISP1

, ISP

2 an

d IS

P3 a

nd w

ith fl

anki

ng p

rote

ins

iden

tifie

d sim

ply

as d

elim

. Not

e th

at th

e bl

ue d

elim

pro

tein

s ar

e no

t the

sam

e, th

ey w

ere

kept

in th

is im

age

only

to il

lust

rate

the

gene

loci

cor

rect

ly

Page 79: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

79

The ancestor of all trypanosomatids either participated in multiple lateral

gene transfers with ecotin-possessing bacteria or this event occurred only once

and was followed by multiple gene duplications. If multiple gene transfers

occurred, they probably happened no more than three times for ISPs 1, 2 and 3,

and the additional ISP2 and ISP3 copies carried by Leishmania spp. are the result of

a subsequent duplication. The positions of ISP1 and ISP2 in T. brucei, in the first and

second ecotin loci respectively, with the ISP2 being probably homologous to T. cruzi

ISP2, points to an early locus duplication, occurring before the two genera split. In

this scenario, Trypanosoma spp. subsequently lost copies of the gene; their

sequences show a much more compact genome when compared to Leishmania

spp. in the images presented, leading to the suspicion that more deletions occurred

in Trypanosoma species than in Leishmania, which would lend credence to the idea

that T. cruzi and T. brucei lost some of their ecotin copies. A possible sequence of

events based in this limited dataset is this: the common Trypanosomatida ancestor

had three ecotin copies in the first locus (at around position 120 kbp in

chromosome 15), gotten either via lateral transfers with bacteria or via a single

lateral transfer followed by contiguous duplication. The ISP2 and ISP3 ancestors in

this locus then suffered a simultaneous duplication event, creating the second

locus at around 190 kbp. Subsequently, various species lost some of these copies.

The preservation of ISP2 in almost all species is an interesting fact, and

makes sense given the ample evidence of its importance against hosts’ immune

systems. Another interesting fact is that T. brucei parasites preserved the ISP1

variant in all cases, while T. cruzi lost the ISP1 gene. Since ISP1 seems to be involved

in the development of motility and flagellar development in promastigotes inside

the insect vector in Leishmania species (MORRISON et al., 2012), this could be a

reason for its preservation in T. brucei and loss in T. cruzi. These species are

members of section Salivaria and Stercoraria respectively, with different life cycles

and methods of transmission. While T. cruzi is transmitted by hemipterans, with

Page 80: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

80

infecting parasites deposited with their feces on the vertebrate host, T. brucei lives

in the salivary gland of dipteran insects and is injected by their proboscis like the

Leishmania species. It could be that the ISP1 ecotin variant gives some advantage

to trypanosomatids with dipterans as their arthropod hosts. This association could

result in an interesting future research project, with potential public health

applications.

These speculations are given to encourage further research. As tempting as

it is to affirm their validity, our data set is very limited in scope and of very poor

quality in some cases make bold affirmations. Automated genomic annotation can

only go so far, and some of these sequences have errors, omissions and other

problems. Looking closely at the L. mexicana data in fig. 13, for example, it seems

that the coding sequences between the first ISP occurrence and the CHP2 labelled

gene should clearly be labelled as ISP2 and ISP3, but in the automated annotation

they appear as “unknown proteins”. Nevertheless, since the amount of available

genomic data grows so fast, these speculations can be further developed as more

data becomes available.

One thing this work clearly shows without a doubt is the ubiquity of large

amounts of unreviewed genomic data online. The amount of retrievable

information at very little monetary cost and using free-to-use bioinformatics tools

is huge, and in this world of big data and exponentially falling sequencing costs, this

fact will become more obvious as time passes. The next generation of budding

biologists may well have to learn programming before they learn the names of all

the plant and animal families.

Page 81: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

81

Final remarks

Biology is the most interdisciplinary of hard sciences. A biologist cannot get

by with only an encyclopedic knowledge of plants and animals, as was common in

the last century.

Today, working or aspiring biologists must contend with the fact that

modern biology is as much about large amounts of data as it is about physiology

and nomenclature. Embracing bioinformatic tools and methods is the only way

forward for those in the field not wanting to be left behind, especially in genetics,

ecology and other fields where the amount of data can clearly outstrip anyone’s

capacity unaided analysis.

Going forward, we must ourselves adapt to this brave new world of big data

and big data analysis — and spur or peers to do the same — or risk fast becoming

obsolete.

Page 82: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

82

Page 83: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

83

References

ALVES, J. M. P. et al. Endosymbiosis in trypanosomatids: the genomic cooperation between bacterium and host in the synthesis of essential amino acids is heavily influenced by multiple horizontal gene transfers. BMC evolutionary biology, [s. l.], v. 13, n. 1, p. 190, 2013. Disponível em: <http://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-13-190>. Acesso em: 15 nov. 2017.

ANDRADE, I. d. S. et al. Characterization of a porin channel in the endosymbiont of the trypanosomatid protozoan Crithidia deanei. Microbiology, [s. l.], v. 157, n. 10, p. 2818–2830, 2011. Disponível em: <http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.049247-0>. Acesso em: 15 nov. 2017.

BEAULIEU, A. Learning SQL. [s.l.] : O’Reilly Media, Incorporated, 2009. Disponível em: <https://market.android.com/details?id=book-1KybAgAAQBAJ>

BORATYN, G. M. et al. BLAST: a more efficient report with usability improvements. Nucleic acids research, [s. l.], v. 41, n. W1, p. W29–W33, 2013. Disponível em: <http://academic.oup.com/nar/article/41/W1/W29/1091045/BLAST-a-more-efficient-report-with-usability>. Acesso em: 16 nov. 2017.

BORGHESAN, T. C. et al. Molecular phylogenetic redefinition of Herpetomonas (Kinetoplastea, Trypanosomatidae), a genus of insect parasites associated with flies. Protist, [s. l.], v. 164, n. 1, p. 129–152, 2013. Disponível em: <http://dx.doi.org/10.1016/j.protis.2012.06.001>

CAMARGO, E. P. et al. Ribosomal DNA restriction analysis and synthetic oligonucleotide probing in the identification of genera of lower trypanosomatids. The Journal of parasitology, [s. l.], v. 78, n. 1, p. 40–48, 1992. Disponível em: <https://www.ncbi.nlm.nih.gov/pubmed/1310733>

Cascading Style Sheets. [s.d.]. Disponível em: <https://www.w3.org/Style/CSS/Overview.en.html>. Acesso em: 2 jun. 2018.

CASPER, J. et al. The UCSC Genome Browser database: 2018 update. Nucleic acids research, [s. l.], v. 46, n. D1, p. D762–D769, 2018. Disponível em: <http://dx.doi.org/10.1093/nar/gkx1020>

CHUNG, C. H. et al. Purification from Escherichia coli of a periplasmic protein that is a potent inhibitor of pancreatic proteases. The Journal of biological chemistry, [s. l.], v. 258, n. 18, p. 11032–11038, 1983. Disponível em: <https://www.ncbi.nlm.nih.gov/pubmed/6411724>

CORTEZ, A. P. et al. The taxonomic and phylogenetic relationships of Trypanosoma vivax from South America and Africa. Parasitology, [s. l.], v. 133, n. Pt 2, p. 159–169, 2006. Disponível em: <http://dx.doi.org/10.1017/S0031182006000254>

DARRIBA, D. et al. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics , [s. l.], v. 27, n. 8, p. 1164–1165, 2011. Disponível em: <http://dx.doi.org/10.1093/bioinformatics/btr088>

Page 84: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

84

DA SILVA, F. M. et al. Phylogeny, taxonomy and grouping of Trypanosoma rangeli isolates from man, triatomines and sylvatic mammals from widespread geographical origin based on SSU and ITS ribosomal sequences. Parasitology, [s. l.], v. 129, n. Pt 5, p. 549–561, 2004. Disponível em: <https://www.ncbi.nlm.nih.gov/pubmed/15552400>

DOWNING, T. et al. Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance. Genome research, [s. l.], v. 21, n. 12, p. 2143–2156, 2011. Disponível em: <http://dx.doi.org/10.1101/gr.123430.111>

DUCKETT, J. Web Design with HTML, CSS, JavaScript and jQuery Set. [s.l.] : Wiley, 2014. Disponível em: <https://market.android.com/details?id=book-zb9EnwEACAAJ>

EGGERS, C. T. et al. The role of ecotin dimerization in protease inhibition. Journal of molecular biology, [s. l.], v. 308, n. 5, p. 975–991, 5/2001. Disponível em: <http://linkinghub.elsevier.com/retrieve/pii/S0022283601947540>. Acesso em: 26 jul. 2015.

EGGERS, C. T. et al. The periplasmic serine protease inhibitor ecotin protects bacteria against neutrophil elastase. Biochemical Journal, [s. l.], v. 379, n. Pt 1, p. 107–118, 2004. Disponível em: <http://dx.doi.org/10.1042/BJ20031790>

EL-SAYED, N. M. et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science, [s. l.], v. 309, n. 5733, p. 409–415, 2005. Disponível em: <http://dx.doi.org/10.1126/science.1112631>

ESCHENLAUER, S. C. P. et al. Influence of parasite encoded inhibitors of serine peptidases in early infection of macrophages with Leishmania major. Cellular microbiology, [s. l.], v. 11, n. 1, p. 106–120, 2009. Disponível em: <http://dx.doi.org/10.1111/j.1462-5822.2008.01243.x>

FARIA, M. S. et al. Leishmania inhibitor of serine peptidase 2 prevents TLR4 activation by neutrophil elastase promoting parasite survival in murine macrophages. Journal of immunology , [s. l.], v. 186, n. 1, p. 411–422, 2011. Disponível em: <http://dx.doi.org/10.4049/jimmunol.1002175>

FERREIRA, R. C. et al. Morphological and molecular diversity and phylogenetic relationships among anuran trypanosomes from the Amazonia, Atlantic Forest and Pantanal biomes in Brazil. Parasitology, [s. l.], v. 134, n. Pt 11, p. 1623–1638, 2007. Disponível em: <http://dx.doi.org/10.1017/S0031182007003058>

FINE, M. Beta Testing for Better Software. [s.l.] : Wiley, 2002. Disponível em: <https://market.android.com/details?id=book-rZa6DoPlAsgC>

GOUY, M.; GUINDON, S.; GASCUEL, O. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular biology and evolution, [s. l.], v. 27, n. 2, p. 221–224, 2010. Disponível em: <http://dx.doi.org/10.1093/molbev/msp259>

GRIMALDI, G.; TESH, R. B. Leishmaniases of the New World: current concepts and implications for future research. Clinical microbiology reviews, [s. l.], v. 6, n. 3, p. 230–250, 1993. Disponível em: <https://www.ncbi.nlm.nih.gov/pubmed/8358705>

Page 85: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

85

HAMILTON, P. B. et al. Trypanosomes are monophyletic: evidence from genes for glyceraldehyde phosphate dehydrogenase and small subunit ribosomal RNA. International journal for parasitology, [s. l.], v. 34, n. 12, p. 1393–1404, 2004. Disponível em: <http://dx.doi.org/10.1016/j.ijpara.2004.08.011>

HAMILTON, P. B.; GIBSON, W. C.; STEVENS, J. R. Patterns of co-evolution between trypanosomes and their hosts deduced from ribosomal RNA and protein-coding gene phylogenies. Molecular phylogenetics and evolution, [s. l.], v. 44, n. 1, p. 15–25, 2007. Disponível em: <http://dx.doi.org/10.1016/j.ympev.2007.03.023>

HOARE, C. A. The trypanosomes of mammals: a zoological monograph. Oxford: Blackwell Scientific Publ, 1972.

HTML 5.1 2nd Edition. [s.d.]. Disponível em: <https://www.w3.org/TR/html51/>. Acesso em: 2 jun. 2018.

IRELAND, P. M. et al. The serine protease inhibitor Ecotin is required for full virulence of Burkholderia pseudomallei. Microbial pathogenesis, [s. l.], v. 67-68, p. 55–58, 02/2014. Disponível em: <http://linkinghub.elsevier.com/retrieve/pii/S0882401014000059>. Acesso em: 15 nov. 2017.

IVENS, A. C. et al. The genome of the kinetoplastid parasite, Leishmania major. Science, [s. l.], v. 309, n. 5733, p. 436–442, 2005. Disponível em: <http://dx.doi.org/10.1126/science.1112680>

JACKSON, A. P. Genome evolution in trypanosomatid parasites. Parasitology, [s. l.], v. 142, n. S1, p. S40–S56, 02/2015. Disponível em: <http://www.journals.cambridge.org/abstract_S0031182014000894>. Acesso em: 15 nov. 2017.

JavaScript W3Schools. [s.d.]. Disponível em: <https://www.w3schools.com/jS/default.asp>. Acesso em: 2 jun. 2018.

LAUWEREYS, M. J.; LAMBEIR, A.-M. V. R. Ecotin as a factor Xa, XIa, and XIIa inhibitor, 5585259 A, 1996.

LETUNIC, I.; BORK, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic acids research, [s. l.], v. 39, n. Web Server issue, p. W475–8, 2011. Disponível em: <http://dx.doi.org/10.1093/nar/gkr201>

LIMA, L. et al. Evolutionary insights from bat trypanosomes: morphological, developmental and phylogenetic evidence of a new species, Trypanosoma (Schizotrypanum) erneyi sp. nov., in African bats closely related to Trypanosoma (Schizotrypanum) cruzi and allied species. Protist, [s. l.], v. 163, n. 6, p. 856–872, 2012. Disponível em: <http://dx.doi.org/10.1016/j.protis.2011.12.003>

MARCILI, A. Trypanosoma cruzi: Diversidade, relações filogenéticas e padrões ecogeográficos de isolados silvestres. 2008. Universidade de São Paulo, São Paulo, 2008.

MARCILI, A. et al. A new genotype of Trypanosoma cruzi associated with bats evidenced by phylogenetic analyses using SSU rDNA, cytochrome b and Histone H2B genes and

Page 86: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

86

genotyping based on ITS1 rDNA. Parasitology, [s. l.], v. 136, n. 6, p. 641–655, 2009. Disponível em: <http://dx.doi.org/10.1017/S0031182009005861>

MCGRATH, M. E.; GILLMOR, S. A.; FLETTERICK, R. J. Ecotin: lessons on survival in a protease-filled world. Protein science: a publication of the Protein Society, [s. l.], v. 4, n. 2, p. 141–148, 1995. Disponível em: <http://dx.doi.org/10.1002/pro.5560040201>

MOLYNEUX, D. Neglected tropical diseases. Community eye health / International Centre for Eye Health, [s. l.], v. 26, n. 82, p. 21–24, 2013. Disponível em: <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756642/>. Acesso em: 27 jul. 2015.

MORRISON, L. S. et al. Ecotin-like serine peptidase inhibitor ISP1 of Leishmania major plays a role in flagellar pocket dynamics and promastigote differentiation. Cellular microbiology, [s. l.], v. 14, n. 8, p. 1271–1286, 2012. Disponível em: <http://dx.doi.org/10.1111/j.1462-5822.2012.01798.x>

MOTTA, M. C. M. et al. The Bacterium Endosymbiont of Crithidia deanei Undergoes Coordinated Division with the Host Cell Nucleus. PloS one, [s. l.], v. 5, n. 8, p. e12415, 2010. Disponível em: <http://dx.plos.org/10.1371/journal.pone.0012415>. Acesso em: 15 nov. 2017.

MySQL. [s.d.]. Disponível em: <https://www.mysql.com/>. Acesso em: 2 jun. 2018.

Neglected tropical diseases. 2015. Disponível em: <http://www.who.int/neglected_diseases/diseases/en/>. Acesso em: 8 out. 2015.

Notepad++. [s.d.]. Disponível em: <https://notepad-plus-plus.org/>. Acesso em: 2 jun. 2018.

OPPERDOES, F. R.; MICHELS, P. A. M. Horizontal gene transfer in trypanosomatids. Trends in parasitology, [s. l.], v. 23, n. 10, p. 470–476, 2007. Disponível em: <http://dx.doi.org/10.1016/j.pt.2007.08.002>

PÉREZ, H. A. G. Diagnóstico, caracterização molecular e epidemiologia de tripanossomas de ungulados. 2012. Universidade de São Paulo, São Paulo, 2012. Disponível em: <http://www.teses.usp.br/teses/disponiveis/42/42135/tde-20092012-084016/publico/HeraklesAntonioGarciaPerez_Doutorado.pdf>. Acesso em: 7 out. 2015.

PHILLIPS, A. WAMP It Up! Kindle ed. [s.l.] : Pintop Productions, 2012. Disponível em: <https://www.amazon.com/WAMP-Up-Arelthia-Phillips-ebook/dp/B007QOBZGG/ref=sr_1_1?s=books&ie=UTF8&qid=1529210716&sr=1-1&keywords=wamp>

PHP: Hypertext Preprocessor. [s.d.]. Disponível em: <http://php.net/>. Acesso em: 2 jun. 2018.

PRUITT, K.; BROWN, G.; MURPHY, M. RefSeq Frequently Asked Questions (FAQ). [s.l.] : National Center for Biotechnology Information (US), 2018. Disponível em: <https://www.ncbi.nlm.nih.gov/books/NBK50679/>. Acesso em: 17 jun. 2018.

RAMOS, F. et al. E. dispar strain: analysis of polymorphism as a tool for study of geographic distribution. Molecular and biochemical parasitology, [s. l.], v. 141, n. 2, p. 175–177, 2005. Disponível em: <http://dx.doi.org/10.1016/j.molbiopara.2005.02.010>

REY, L. Parasitologia. 4a. ed. Rio de Janeiro: Guanabara Koogan, 2008.

Page 87: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

87

RIBEIRO-GOMES, F. L. et al. Neutrophils activate macrophages for intracellular killing of Leishmania major through recruitment of TLR4 by neutrophil elastase. Journal of immunology , [s. l.], v. 179, n. 6, p. 3988–3994, 2007. Disponível em: <https://www.ncbi.nlm.nih.gov/pubmed/17785837>

RODRIGUES, A. C. et al. Phylogeny of Trypanosoma ( Megatrypanum ) theileri and related trypanosomes reveals lineages of isolates associated with artiodactyl hosts diverging on SSU and ITS ribosomal sequences. Parasitology, [s. l.], v. 132, n. Pt 2, p. 215–224, 2006. Disponível em: <http://dx.doi.org/10.1017/S0031182005008929>

ROGERS, M. B. et al. Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania. Genome research, [s. l.], v. 21, n. 12, p. 2129–2142, 2011. Disponível em: <http://dx.doi.org/10.1101/gr.122945.111>

SAMBROOK, J.; RUSSELL, D. W. Molecular cloning: a laboratory manual. 3rd ed. Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory Press, 2001.

Sample GenBank Record. [s.d.]. Disponível em: <https://www.ncbi.nlm.nih.gov/genbank/samplerecord/>. Acesso em: 2 jun. 2018.

SKINNER, M. E. et al. JBrowse: a next-generation genome browser. Genome research, [s. l.], v. 19, n. 9, p. 1630–1638, 2009. Disponível em: <http://dx.doi.org/10.1101/gr.094607.109>

SQL W3Schools. [s.d.]. Disponível em: <https://www.w3schools.com/sql/>. Acesso em: 2 jun. 2018.

StatCounter Global Stats - Browser Usage Share. [s.d.]. Disponível em: <http://gs.statcounter.com/>. Acesso em: 2 jun. 2018.

STEIN, L. D. et al. The generic genome browser: a building block for a model organism system database. Genome research, [s. l.], v. 12, n. 10, p. 1599–1610, 2002. Disponível em: <http://dx.doi.org/10.1101/gr.403602>

STEVENS, J. R. et al. The ancient and divergent origins of the human pathogenic trypanosomes, Trypanosoma brucei and T. cruzi. Parasitology, [s. l.], v. 118 ( Pt 1), p. 107–116, 1999. Disponível em: <https://www.ncbi.nlm.nih.gov/pubmed/10070668>

STEVENS, J. R.; NOYES, H.; GIBSON, W. The Evolution of Trypanosomes Infecting Humans and Primates. Memórias do Instituto Oswaldo Cruz, [s. l.], v. 93, n. 5, p. 669–676, 1998. Disponível em: <http://dx.doi.org/10.1590/S0074-02761998000500019>

TEIXEIRA, M. M. G. et al. Phylogenetic validation of the genera Angomonas and Strigomonas of trypanosomatids harboring bacterial endosymbionts with the description of new species of trypanosomatids and of proteobacterial symbionts. Protist, [s. l.], v. 162, n. 3, p. 503–524, 2011. Disponível em: <http://dx.doi.org/10.1016/j.protis.2011.01.001>

TRUBETSKOY, S. List of 20 Simple, Distinct Colors. 2017. Disponível em: <https://sashat.me/2017/01/11/list-of-20-simple-distinct-colors/>. Acesso em: 17 jun. 2018.

UCSC Genome Browser Gateway. [s.d.]. Disponível em: <https://genome.ucsc.edu/cgi-bin/hgGateway>. Acesso em: 2 jun. 2018.

Page 88: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

88

VIOLA, L. B. et al. Phylogenetic analyses based on small subunit rRNA and glycosomal glyceraldehyde-3-phosphate dehydrogenase genes and ultrastructural characterization of two snake Trypanosomes: Trypanosoma serpentis n. sp. from Pseudoboa nigra and Trypanosoma cascavelli from Crotalus durissus terrificus. The Journal of eukaryotic microbiology, [s. l.], v. 56, n. 6, p. 594–602, 2009. Disponível em: <http://dx.doi.org/10.1111/j.1550-7408.2009.00444.x>

VOTÝPKA, J. et al. New Approaches to Systematics of Trypanosomatidae: Criteria for Taxonomic (Re)description. Trends in parasitology, [s. l.], v. 31, n. 10, p. 460–469, 10/2015. Disponível em: <http://linkinghub.elsevier.com/retrieve/pii/S147149221500152X>. Acesso em: 15 nov. 2017.

WampServer. [s.d.]. Disponível em: <http://www.wampserver.com/en/>. Acesso em: 2 jun. 2018.

WELLING, L.; THOMSON, L. PHP and MySQL Web Development. [s.l.] : Addison-Wesley, 2016. Disponível em: <https://market.android.com/details?id=book-zlqOMQEACAAJ>

Yahoo Small Business Web Hosting. [s.d.]. Disponível em: <https://smallbusiness.yahoo.com/>. Acesso em: 2 jun. 2018.

YANG, S. Q. et al. Ecotin: a serine protease inhibitor with two distinct and interacting binding sites. Journal of molecular biology, [s. l.], v. 279, n. 4, p. 945–957, 1998. Disponível em: <http://dx.doi.org/10.1006/jmbi.1998.1748>

Page 89: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

89

Glossary

BLAST Standard algorithm for aligning, searching and comparing genetic and protein sequence data.

CDS A sequence of nucleotides that encodes a protein in a gene.

CHP A gene that appears to encode a complete protein of unknown function or origin, deducted by computational analysis of genomic data.

CSS The standard style sheet language used in webpages to apply formatting to HTML pages.

CSS3 CSS version 3, the current version as of 2018.

FASTA Standard text-based format for representing genetic or amino acid sequence data.

GenBank GenBank is the National Institutes of Health's genetic sequence database, publicly and freely available online. The name also describes the standardized text file format in which individual records are presented.

HTML Standard markup language for creating Web pages.

HTML5 HTML version 5, the current version as of 2018.

ISP Serinepeptidase inhibitors in trypanosomatids, ecotin homologs, are called ISPs in the literature.

Javascript Scripting language that runs locally in web browsers to execute commands from dynamic web pages. It is the de facto standard for modern browsers.

MySQL Widely used relational open-source database using the SQL interface.

Notepad++ Freeware text editor with markup for dozens of common programming languages.

PHP One of the most used interpreted server-side programming languages.

SQL Standardized query language for relational database interactions.

tBLASTn A type of BLAST search that takes a protein sequence as input and matches it to possible translations in stored genomic data.

Page 90: LOCI IMAGE GENERATOR AND THE EVOLUTION OF …propg.ufabc.edu.br/evodiv/wp-content/uploads/2019/...Agradeço aos meus pais, Cleide M. Fuhlendorf e Max Fuhlendorf Jr, profundamente.

90

WAMP A framework for web server deployment in the Windows operating system using the Apache web server, the PHP server-side interpreted programming language and the MySQL relational database system.

WampServer A freely available implementation of WAMP.