ruiheesi commited on
Commit
12ac14b
·
1 Parent(s): 9d87935

Added Application Files

Browse files
Dockerfile ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use the official Ubuntu 20.04 image as the base
2
+ FROM ubuntu:20.04
3
+
4
+ # Set environment variables
5
+ ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
6
+
7
+ # Install system dependencies
8
+ RUN apt-get update && apt-get install -y \
9
+ wget \
10
+ bzip2 \
11
+ ca-certificates \
12
+ curl \
13
+ git \
14
+ && rm -rf /var/lib/apt/lists/*
15
+
16
+ # Install Miniconda
17
+ ENV CONDA_DIR=/opt/conda
18
+ RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh && \
19
+ /bin/bash /tmp/miniconda.sh -b -p $CONDA_DIR && \
20
+ rm /tmp/miniconda.sh
21
+
22
+ # Add Miniconda to the PATH
23
+ ENV PATH=$CONDA_DIR/bin:$PATH
24
+
25
+ # Create a Conda environment using Mamba
26
+ COPY environment.yml /tmp/environment.yml
27
+ RUN conda env create -n caNanoLibrarian -f /tmp/environment.yml && \
28
+ rm /tmp/environment.yml
29
+
30
+ # Activate the Conda environment by default
31
+ ENV PATH=$CONDA_DIR/envs/caNanoLibrarian/bin:$PATH
32
+
33
+ # Set the working directory in the container
34
+ WORKDIR /app
35
+
36
+ # Copy your application files
37
+ COPY . /app
38
+
39
+ # Expose the container port
40
+ EXPOSE 5000
41
+
42
+ # Set environment variables (optional)
43
+ ENV FLASK_APP=app.py
44
+ ENV FLASK_RUN_HOST=0.0.0.0
45
+
46
+ # Define the command to run your application
47
+ CMD [ "python", "app.py" ]
LICENSE ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GNU GENERAL PUBLIC LICENSE
2
+ Version 3, 29 June 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+ Preamble
9
+
10
+ The GNU General Public License is a free, copyleft license for
11
+ software and other kinds of works.
12
+
13
+ The licenses for most software and other practical works are designed
14
+ to take away your freedom to share and change the works. By contrast,
15
+ the GNU General Public License is intended to guarantee your freedom to
16
+ share and change all versions of a program--to make sure it remains free
17
+ software for all its users. We, the Free Software Foundation, use the
18
+ GNU General Public License for most of our software; it applies also to
19
+ any other work released this way by its authors. You can apply it to
20
+ your programs, too.
21
+
22
+ When we speak of free software, we are referring to freedom, not
23
+ price. Our General Public Licenses are designed to make sure that you
24
+ have the freedom to distribute copies of free software (and charge for
25
+ them if you wish), that you receive source code or can get it if you
26
+ want it, that you can change the software or use pieces of it in new
27
+ free programs, and that you know you can do these things.
28
+
29
+ To protect your rights, we need to prevent others from denying you
30
+ these rights or asking you to surrender the rights. Therefore, you have
31
+ certain responsibilities if you distribute copies of the software, or if
32
+ you modify it: responsibilities to respect the freedom of others.
33
+
34
+ For example, if you distribute copies of such a program, whether
35
+ gratis or for a fee, you must pass on to the recipients the same
36
+ freedoms that you received. You must make sure that they, too, receive
37
+ or can get the source code. And you must show them these terms so they
38
+ know their rights.
39
+
40
+ Developers that use the GNU GPL protect your rights with two steps:
41
+ (1) assert copyright on the software, and (2) offer you this License
42
+ giving you legal permission to copy, distribute and/or modify it.
43
+
44
+ For the developers' and authors' protection, the GPL clearly explains
45
+ that there is no warranty for this free software. For both users' and
46
+ authors' sake, the GPL requires that modified versions be marked as
47
+ changed, so that their problems will not be attributed erroneously to
48
+ authors of previous versions.
49
+
50
+ Some devices are designed to deny users access to install or run
51
+ modified versions of the software inside them, although the manufacturer
52
+ can do so. This is fundamentally incompatible with the aim of
53
+ protecting users' freedom to change the software. The systematic
54
+ pattern of such abuse occurs in the area of products for individuals to
55
+ use, which is precisely where it is most unacceptable. Therefore, we
56
+ have designed this version of the GPL to prohibit the practice for those
57
+ products. If such problems arise substantially in other domains, we
58
+ stand ready to extend this provision to those domains in future versions
59
+ of the GPL, as needed to protect the freedom of users.
60
+
61
+ Finally, every program is threatened constantly by software patents.
62
+ States should not allow patents to restrict development and use of
63
+ software on general-purpose computers, but in those that do, we wish to
64
+ avoid the special danger that patents applied to a free program could
65
+ make it effectively proprietary. To prevent this, the GPL assures that
66
+ patents cannot be used to render the program non-free.
67
+
68
+ The precise terms and conditions for copying, distribution and
69
+ modification follow.
70
+
71
+ TERMS AND CONDITIONS
72
+
73
+ 0. Definitions.
74
+
75
+ "This License" refers to version 3 of the GNU General Public License.
76
+
77
+ "Copyright" also means copyright-like laws that apply to other kinds of
78
+ works, such as semiconductor masks.
79
+
80
+ "The Program" refers to any copyrightable work licensed under this
81
+ License. Each licensee is addressed as "you". "Licensees" and
82
+ "recipients" may be individuals or organizations.
83
+
84
+ To "modify" a work means to copy from or adapt all or part of the work
85
+ in a fashion requiring copyright permission, other than the making of an
86
+ exact copy. The resulting work is called a "modified version" of the
87
+ earlier work or a work "based on" the earlier work.
88
+
89
+ A "covered work" means either the unmodified Program or a work based
90
+ on the Program.
91
+
92
+ To "propagate" a work means to do anything with it that, without
93
+ permission, would make you directly or secondarily liable for
94
+ infringement under applicable copyright law, except executing it on a
95
+ computer or modifying a private copy. Propagation includes copying,
96
+ distribution (with or without modification), making available to the
97
+ public, and in some countries other activities as well.
98
+
99
+ To "convey" a work means any kind of propagation that enables other
100
+ parties to make or receive copies. Mere interaction with a user through
101
+ a computer network, with no transfer of a copy, is not conveying.
102
+
103
+ An interactive user interface displays "Appropriate Legal Notices"
104
+ to the extent that it includes a convenient and prominently visible
105
+ feature that (1) displays an appropriate copyright notice, and (2)
106
+ tells the user that there is no warranty for the work (except to the
107
+ extent that warranties are provided), that licensees may convey the
108
+ work under this License, and how to view a copy of this License. If
109
+ the interface presents a list of user commands or options, such as a
110
+ menu, a prominent item in the list meets this criterion.
111
+
112
+ 1. Source Code.
113
+
114
+ The "source code" for a work means the preferred form of the work
115
+ for making modifications to it. "Object code" means any non-source
116
+ form of a work.
117
+
118
+ A "Standard Interface" means an interface that either is an official
119
+ standard defined by a recognized standards body, or, in the case of
120
+ interfaces specified for a particular programming language, one that
121
+ is widely used among developers working in that language.
122
+
123
+ The "System Libraries" of an executable work include anything, other
124
+ than the work as a whole, that (a) is included in the normal form of
125
+ packaging a Major Component, but which is not part of that Major
126
+ Component, and (b) serves only to enable use of the work with that
127
+ Major Component, or to implement a Standard Interface for which an
128
+ implementation is available to the public in source code form. A
129
+ "Major Component", in this context, means a major essential component
130
+ (kernel, window system, and so on) of the specific operating system
131
+ (if any) on which the executable work runs, or a compiler used to
132
+ produce the work, or an object code interpreter used to run it.
133
+
134
+ The "Corresponding Source" for a work in object code form means all
135
+ the source code needed to generate, install, and (for an executable
136
+ work) run the object code and to modify the work, including scripts to
137
+ control those activities. However, it does not include the work's
138
+ System Libraries, or general-purpose tools or generally available free
139
+ programs which are used unmodified in performing those activities but
140
+ which are not part of the work. For example, Corresponding Source
141
+ includes interface definition files associated with source files for
142
+ the work, and the source code for shared libraries and dynamically
143
+ linked subprograms that the work is specifically designed to require,
144
+ such as by intimate data communication or control flow between those
145
+ subprograms and other parts of the work.
146
+
147
+ The Corresponding Source need not include anything that users
148
+ can regenerate automatically from other parts of the Corresponding
149
+ Source.
150
+
151
+ The Corresponding Source for a work in source code form is that
152
+ same work.
153
+
154
+ 2. Basic Permissions.
155
+
156
+ All rights granted under this License are granted for the term of
157
+ copyright on the Program, and are irrevocable provided the stated
158
+ conditions are met. This License explicitly affirms your unlimited
159
+ permission to run the unmodified Program. The output from running a
160
+ covered work is covered by this License only if the output, given its
161
+ content, constitutes a covered work. This License acknowledges your
162
+ rights of fair use or other equivalent, as provided by copyright law.
163
+
164
+ You may make, run and propagate covered works that you do not
165
+ convey, without conditions so long as your license otherwise remains
166
+ in force. You may convey covered works to others for the sole purpose
167
+ of having them make modifications exclusively for you, or provide you
168
+ with facilities for running those works, provided that you comply with
169
+ the terms of this License in conveying all material for which you do
170
+ not control copyright. Those thus making or running the covered works
171
+ for you must do so exclusively on your behalf, under your direction
172
+ and control, on terms that prohibit them from making any copies of
173
+ your copyrighted material outside their relationship with you.
174
+
175
+ Conveying under any other circumstances is permitted solely under
176
+ the conditions stated below. Sublicensing is not allowed; section 10
177
+ makes it unnecessary.
178
+
179
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180
+
181
+ No covered work shall be deemed part of an effective technological
182
+ measure under any applicable law fulfilling obligations under article
183
+ 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184
+ similar laws prohibiting or restricting circumvention of such
185
+ measures.
186
+
187
+ When you convey a covered work, you waive any legal power to forbid
188
+ circumvention of technological measures to the extent such circumvention
189
+ is effected by exercising rights under this License with respect to
190
+ the covered work, and you disclaim any intention to limit operation or
191
+ modification of the work as a means of enforcing, against the work's
192
+ users, your or third parties' legal rights to forbid circumvention of
193
+ technological measures.
194
+
195
+ 4. Conveying Verbatim Copies.
196
+
197
+ You may convey verbatim copies of the Program's source code as you
198
+ receive it, in any medium, provided that you conspicuously and
199
+ appropriately publish on each copy an appropriate copyright notice;
200
+ keep intact all notices stating that this License and any
201
+ non-permissive terms added in accord with section 7 apply to the code;
202
+ keep intact all notices of the absence of any warranty; and give all
203
+ recipients a copy of this License along with the Program.
204
+
205
+ You may charge any price or no price for each copy that you convey,
206
+ and you may offer support or warranty protection for a fee.
207
+
208
+ 5. Conveying Modified Source Versions.
209
+
210
+ You may convey a work based on the Program, or the modifications to
211
+ produce it from the Program, in the form of source code under the
212
+ terms of section 4, provided that you also meet all of these conditions:
213
+
214
+ a) The work must carry prominent notices stating that you modified
215
+ it, and giving a relevant date.
216
+
217
+ b) The work must carry prominent notices stating that it is
218
+ released under this License and any conditions added under section
219
+ 7. This requirement modifies the requirement in section 4 to
220
+ "keep intact all notices".
221
+
222
+ c) You must license the entire work, as a whole, under this
223
+ License to anyone who comes into possession of a copy. This
224
+ License will therefore apply, along with any applicable section 7
225
+ additional terms, to the whole of the work, and all its parts,
226
+ regardless of how they are packaged. This License gives no
227
+ permission to license the work in any other way, but it does not
228
+ invalidate such permission if you have separately received it.
229
+
230
+ d) If the work has interactive user interfaces, each must display
231
+ Appropriate Legal Notices; however, if the Program has interactive
232
+ interfaces that do not display Appropriate Legal Notices, your
233
+ work need not make them do so.
234
+
235
+ A compilation of a covered work with other separate and independent
236
+ works, which are not by their nature extensions of the covered work,
237
+ and which are not combined with it such as to form a larger program,
238
+ in or on a volume of a storage or distribution medium, is called an
239
+ "aggregate" if the compilation and its resulting copyright are not
240
+ used to limit the access or legal rights of the compilation's users
241
+ beyond what the individual works permit. Inclusion of a covered work
242
+ in an aggregate does not cause this License to apply to the other
243
+ parts of the aggregate.
244
+
245
+ 6. Conveying Non-Source Forms.
246
+
247
+ You may convey a covered work in object code form under the terms
248
+ of sections 4 and 5, provided that you also convey the
249
+ machine-readable Corresponding Source under the terms of this License,
250
+ in one of these ways:
251
+
252
+ a) Convey the object code in, or embodied in, a physical product
253
+ (including a physical distribution medium), accompanied by the
254
+ Corresponding Source fixed on a durable physical medium
255
+ customarily used for software interchange.
256
+
257
+ b) Convey the object code in, or embodied in, a physical product
258
+ (including a physical distribution medium), accompanied by a
259
+ written offer, valid for at least three years and valid for as
260
+ long as you offer spare parts or customer support for that product
261
+ model, to give anyone who possesses the object code either (1) a
262
+ copy of the Corresponding Source for all the software in the
263
+ product that is covered by this License, on a durable physical
264
+ medium customarily used for software interchange, for a price no
265
+ more than your reasonable cost of physically performing this
266
+ conveying of source, or (2) access to copy the
267
+ Corresponding Source from a network server at no charge.
268
+
269
+ c) Convey individual copies of the object code with a copy of the
270
+ written offer to provide the Corresponding Source. This
271
+ alternative is allowed only occasionally and noncommercially, and
272
+ only if you received the object code with such an offer, in accord
273
+ with subsection 6b.
274
+
275
+ d) Convey the object code by offering access from a designated
276
+ place (gratis or for a charge), and offer equivalent access to the
277
+ Corresponding Source in the same way through the same place at no
278
+ further charge. You need not require recipients to copy the
279
+ Corresponding Source along with the object code. If the place to
280
+ copy the object code is a network server, the Corresponding Source
281
+ may be on a different server (operated by you or a third party)
282
+ that supports equivalent copying facilities, provided you maintain
283
+ clear directions next to the object code saying where to find the
284
+ Corresponding Source. Regardless of what server hosts the
285
+ Corresponding Source, you remain obligated to ensure that it is
286
+ available for as long as needed to satisfy these requirements.
287
+
288
+ e) Convey the object code using peer-to-peer transmission, provided
289
+ you inform other peers where the object code and Corresponding
290
+ Source of the work are being offered to the general public at no
291
+ charge under subsection 6d.
292
+
293
+ A separable portion of the object code, whose source code is excluded
294
+ from the Corresponding Source as a System Library, need not be
295
+ included in conveying the object code work.
296
+
297
+ A "User Product" is either (1) a "consumer product", which means any
298
+ tangible personal property which is normally used for personal, family,
299
+ or household purposes, or (2) anything designed or sold for incorporation
300
+ into a dwelling. In determining whether a product is a consumer product,
301
+ doubtful cases shall be resolved in favor of coverage. For a particular
302
+ product received by a particular user, "normally used" refers to a
303
+ typical or common use of that class of product, regardless of the status
304
+ of the particular user or of the way in which the particular user
305
+ actually uses, or expects or is expected to use, the product. A product
306
+ is a consumer product regardless of whether the product has substantial
307
+ commercial, industrial or non-consumer uses, unless such uses represent
308
+ the only significant mode of use of the product.
309
+
310
+ "Installation Information" for a User Product means any methods,
311
+ procedures, authorization keys, or other information required to install
312
+ and execute modified versions of a covered work in that User Product from
313
+ a modified version of its Corresponding Source. The information must
314
+ suffice to ensure that the continued functioning of the modified object
315
+ code is in no case prevented or interfered with solely because
316
+ modification has been made.
317
+
318
+ If you convey an object code work under this section in, or with, or
319
+ specifically for use in, a User Product, and the conveying occurs as
320
+ part of a transaction in which the right of possession and use of the
321
+ User Product is transferred to the recipient in perpetuity or for a
322
+ fixed term (regardless of how the transaction is characterized), the
323
+ Corresponding Source conveyed under this section must be accompanied
324
+ by the Installation Information. But this requirement does not apply
325
+ if neither you nor any third party retains the ability to install
326
+ modified object code on the User Product (for example, the work has
327
+ been installed in ROM).
328
+
329
+ The requirement to provide Installation Information does not include a
330
+ requirement to continue to provide support service, warranty, or updates
331
+ for a work that has been modified or installed by the recipient, or for
332
+ the User Product in which it has been modified or installed. Access to a
333
+ network may be denied when the modification itself materially and
334
+ adversely affects the operation of the network or violates the rules and
335
+ protocols for communication across the network.
336
+
337
+ Corresponding Source conveyed, and Installation Information provided,
338
+ in accord with this section must be in a format that is publicly
339
+ documented (and with an implementation available to the public in
340
+ source code form), and must require no special password or key for
341
+ unpacking, reading or copying.
342
+
343
+ 7. Additional Terms.
344
+
345
+ "Additional permissions" are terms that supplement the terms of this
346
+ License by making exceptions from one or more of its conditions.
347
+ Additional permissions that are applicable to the entire Program shall
348
+ be treated as though they were included in this License, to the extent
349
+ that they are valid under applicable law. If additional permissions
350
+ apply only to part of the Program, that part may be used separately
351
+ under those permissions, but the entire Program remains governed by
352
+ this License without regard to the additional permissions.
353
+
354
+ When you convey a copy of a covered work, you may at your option
355
+ remove any additional permissions from that copy, or from any part of
356
+ it. (Additional permissions may be written to require their own
357
+ removal in certain cases when you modify the work.) You may place
358
+ additional permissions on material, added by you to a covered work,
359
+ for which you have or can give appropriate copyright permission.
360
+
361
+ Notwithstanding any other provision of this License, for material you
362
+ add to a covered work, you may (if authorized by the copyright holders of
363
+ that material) supplement the terms of this License with terms:
364
+
365
+ a) Disclaiming warranty or limiting liability differently from the
366
+ terms of sections 15 and 16 of this License; or
367
+
368
+ b) Requiring preservation of specified reasonable legal notices or
369
+ author attributions in that material or in the Appropriate Legal
370
+ Notices displayed by works containing it; or
371
+
372
+ c) Prohibiting misrepresentation of the origin of that material, or
373
+ requiring that modified versions of such material be marked in
374
+ reasonable ways as different from the original version; or
375
+
376
+ d) Limiting the use for publicity purposes of names of licensors or
377
+ authors of the material; or
378
+
379
+ e) Declining to grant rights under trademark law for use of some
380
+ trade names, trademarks, or service marks; or
381
+
382
+ f) Requiring indemnification of licensors and authors of that
383
+ material by anyone who conveys the material (or modified versions of
384
+ it) with contractual assumptions of liability to the recipient, for
385
+ any liability that these contractual assumptions directly impose on
386
+ those licensors and authors.
387
+
388
+ All other non-permissive additional terms are considered "further
389
+ restrictions" within the meaning of section 10. If the Program as you
390
+ received it, or any part of it, contains a notice stating that it is
391
+ governed by this License along with a term that is a further
392
+ restriction, you may remove that term. If a license document contains
393
+ a further restriction but permits relicensing or conveying under this
394
+ License, you may add to a covered work material governed by the terms
395
+ of that license document, provided that the further restriction does
396
+ not survive such relicensing or conveying.
397
+
398
+ If you add terms to a covered work in accord with this section, you
399
+ must place, in the relevant source files, a statement of the
400
+ additional terms that apply to those files, or a notice indicating
401
+ where to find the applicable terms.
402
+
403
+ Additional terms, permissive or non-permissive, may be stated in the
404
+ form of a separately written license, or stated as exceptions;
405
+ the above requirements apply either way.
406
+
407
+ 8. Termination.
408
+
409
+ You may not propagate or modify a covered work except as expressly
410
+ provided under this License. Any attempt otherwise to propagate or
411
+ modify it is void, and will automatically terminate your rights under
412
+ this License (including any patent licenses granted under the third
413
+ paragraph of section 11).
414
+
415
+ However, if you cease all violation of this License, then your
416
+ license from a particular copyright holder is reinstated (a)
417
+ provisionally, unless and until the copyright holder explicitly and
418
+ finally terminates your license, and (b) permanently, if the copyright
419
+ holder fails to notify you of the violation by some reasonable means
420
+ prior to 60 days after the cessation.
421
+
422
+ Moreover, your license from a particular copyright holder is
423
+ reinstated permanently if the copyright holder notifies you of the
424
+ violation by some reasonable means, this is the first time you have
425
+ received notice of violation of this License (for any work) from that
426
+ copyright holder, and you cure the violation prior to 30 days after
427
+ your receipt of the notice.
428
+
429
+ Termination of your rights under this section does not terminate the
430
+ licenses of parties who have received copies or rights from you under
431
+ this License. If your rights have been terminated and not permanently
432
+ reinstated, you do not qualify to receive new licenses for the same
433
+ material under section 10.
434
+
435
+ 9. Acceptance Not Required for Having Copies.
436
+
437
+ You are not required to accept this License in order to receive or
438
+ run a copy of the Program. Ancillary propagation of a covered work
439
+ occurring solely as a consequence of using peer-to-peer transmission
440
+ to receive a copy likewise does not require acceptance. However,
441
+ nothing other than this License grants you permission to propagate or
442
+ modify any covered work. These actions infringe copyright if you do
443
+ not accept this License. Therefore, by modifying or propagating a
444
+ covered work, you indicate your acceptance of this License to do so.
445
+
446
+ 10. Automatic Licensing of Downstream Recipients.
447
+
448
+ Each time you convey a covered work, the recipient automatically
449
+ receives a license from the original licensors, to run, modify and
450
+ propagate that work, subject to this License. You are not responsible
451
+ for enforcing compliance by third parties with this License.
452
+
453
+ An "entity transaction" is a transaction transferring control of an
454
+ organization, or substantially all assets of one, or subdividing an
455
+ organization, or merging organizations. If propagation of a covered
456
+ work results from an entity transaction, each party to that
457
+ transaction who receives a copy of the work also receives whatever
458
+ licenses to the work the party's predecessor in interest had or could
459
+ give under the previous paragraph, plus a right to possession of the
460
+ Corresponding Source of the work from the predecessor in interest, if
461
+ the predecessor has it or can get it with reasonable efforts.
462
+
463
+ You may not impose any further restrictions on the exercise of the
464
+ rights granted or affirmed under this License. For example, you may
465
+ not impose a license fee, royalty, or other charge for exercise of
466
+ rights granted under this License, and you may not initiate litigation
467
+ (including a cross-claim or counterclaim in a lawsuit) alleging that
468
+ any patent claim is infringed by making, using, selling, offering for
469
+ sale, or importing the Program or any portion of it.
470
+
471
+ 11. Patents.
472
+
473
+ A "contributor" is a copyright holder who authorizes use under this
474
+ License of the Program or a work on which the Program is based. The
475
+ work thus licensed is called the contributor's "contributor version".
476
+
477
+ A contributor's "essential patent claims" are all patent claims
478
+ owned or controlled by the contributor, whether already acquired or
479
+ hereafter acquired, that would be infringed by some manner, permitted
480
+ by this License, of making, using, or selling its contributor version,
481
+ but do not include claims that would be infringed only as a
482
+ consequence of further modification of the contributor version. For
483
+ purposes of this definition, "control" includes the right to grant
484
+ patent sublicenses in a manner consistent with the requirements of
485
+ this License.
486
+
487
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
488
+ patent license under the contributor's essential patent claims, to
489
+ make, use, sell, offer for sale, import and otherwise run, modify and
490
+ propagate the contents of its contributor version.
491
+
492
+ In the following three paragraphs, a "patent license" is any express
493
+ agreement or commitment, however denominated, not to enforce a patent
494
+ (such as an express permission to practice a patent or covenant not to
495
+ sue for patent infringement). To "grant" such a patent license to a
496
+ party means to make such an agreement or commitment not to enforce a
497
+ patent against the party.
498
+
499
+ If you convey a covered work, knowingly relying on a patent license,
500
+ and the Corresponding Source of the work is not available for anyone
501
+ to copy, free of charge and under the terms of this License, through a
502
+ publicly available network server or other readily accessible means,
503
+ then you must either (1) cause the Corresponding Source to be so
504
+ available, or (2) arrange to deprive yourself of the benefit of the
505
+ patent license for this particular work, or (3) arrange, in a manner
506
+ consistent with the requirements of this License, to extend the patent
507
+ license to downstream recipients. "Knowingly relying" means you have
508
+ actual knowledge that, but for the patent license, your conveying the
509
+ covered work in a country, or your recipient's use of the covered work
510
+ in a country, would infringe one or more identifiable patents in that
511
+ country that you have reason to believe are valid.
512
+
513
+ If, pursuant to or in connection with a single transaction or
514
+ arrangement, you convey, or propagate by procuring conveyance of, a
515
+ covered work, and grant a patent license to some of the parties
516
+ receiving the covered work authorizing them to use, propagate, modify
517
+ or convey a specific copy of the covered work, then the patent license
518
+ you grant is automatically extended to all recipients of the covered
519
+ work and works based on it.
520
+
521
+ A patent license is "discriminatory" if it does not include within
522
+ the scope of its coverage, prohibits the exercise of, or is
523
+ conditioned on the non-exercise of one or more of the rights that are
524
+ specifically granted under this License. You may not convey a covered
525
+ work if you are a party to an arrangement with a third party that is
526
+ in the business of distributing software, under which you make payment
527
+ to the third party based on the extent of your activity of conveying
528
+ the work, and under which the third party grants, to any of the
529
+ parties who would receive the covered work from you, a discriminatory
530
+ patent license (a) in connection with copies of the covered work
531
+ conveyed by you (or copies made from those copies), or (b) primarily
532
+ for and in connection with specific products or compilations that
533
+ contain the covered work, unless you entered into that arrangement,
534
+ or that patent license was granted, prior to 28 March 2007.
535
+
536
+ Nothing in this License shall be construed as excluding or limiting
537
+ any implied license or other defenses to infringement that may
538
+ otherwise be available to you under applicable patent law.
539
+
540
+ 12. No Surrender of Others' Freedom.
541
+
542
+ If conditions are imposed on you (whether by court order, agreement or
543
+ otherwise) that contradict the conditions of this License, they do not
544
+ excuse you from the conditions of this License. If you cannot convey a
545
+ covered work so as to satisfy simultaneously your obligations under this
546
+ License and any other pertinent obligations, then as a consequence you may
547
+ not convey it at all. For example, if you agree to terms that obligate you
548
+ to collect a royalty for further conveying from those to whom you convey
549
+ the Program, the only way you could satisfy both those terms and this
550
+ License would be to refrain entirely from conveying the Program.
551
+
552
+ 13. Use with the GNU Affero General Public License.
553
+
554
+ Notwithstanding any other provision of this License, you have
555
+ permission to link or combine any covered work with a work licensed
556
+ under version 3 of the GNU Affero General Public License into a single
557
+ combined work, and to convey the resulting work. The terms of this
558
+ License will continue to apply to the part which is the covered work,
559
+ but the special requirements of the GNU Affero General Public License,
560
+ section 13, concerning interaction through a network will apply to the
561
+ combination as such.
562
+
563
+ 14. Revised Versions of this License.
564
+
565
+ The Free Software Foundation may publish revised and/or new versions of
566
+ the GNU General Public License from time to time. Such new versions will
567
+ be similar in spirit to the present version, but may differ in detail to
568
+ address new problems or concerns.
569
+
570
+ Each version is given a distinguishing version number. If the
571
+ Program specifies that a certain numbered version of the GNU General
572
+ Public License "or any later version" applies to it, you have the
573
+ option of following the terms and conditions either of that numbered
574
+ version or of any later version published by the Free Software
575
+ Foundation. If the Program does not specify a version number of the
576
+ GNU General Public License, you may choose any version ever published
577
+ by the Free Software Foundation.
578
+
579
+ If the Program specifies that a proxy can decide which future
580
+ versions of the GNU General Public License can be used, that proxy's
581
+ public statement of acceptance of a version permanently authorizes you
582
+ to choose that version for the Program.
583
+
584
+ Later license versions may give you additional or different
585
+ permissions. However, no additional obligations are imposed on any
586
+ author or copyright holder as a result of your choosing to follow a
587
+ later version.
588
+
589
+ 15. Disclaimer of Warranty.
590
+
591
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592
+ APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593
+ HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594
+ OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595
+ THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596
+ PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597
+ IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598
+ ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599
+
600
+ 16. Limitation of Liability.
601
+
602
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604
+ THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605
+ GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606
+ USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607
+ DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608
+ PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609
+ EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610
+ SUCH DAMAGES.
611
+
612
+ 17. Interpretation of Sections 15 and 16.
613
+
614
+ If the disclaimer of warranty and limitation of liability provided
615
+ above cannot be given local legal effect according to their terms,
616
+ reviewing courts shall apply local law that most closely approximates
617
+ an absolute waiver of all civil liability in connection with the
618
+ Program, unless a warranty or assumption of liability accompanies a
619
+ copy of the Program in return for a fee.
620
+
621
+ END OF TERMS AND CONDITIONS
622
+
623
+ How to Apply These Terms to Your New Programs
624
+
625
+ If you develop a new program, and you want it to be of the greatest
626
+ possible use to the public, the best way to achieve this is to make it
627
+ free software which everyone can redistribute and change under these terms.
628
+
629
+ To do so, attach the following notices to the program. It is safest
630
+ to attach them to the start of each source file to most effectively
631
+ state the exclusion of warranty; and each file should have at least
632
+ the "copyright" line and a pointer to where the full notice is found.
633
+
634
+ <one line to give the program's name and a brief idea of what it does.>
635
+ Copyright (C) <year> <name of author>
636
+
637
+ This program is free software: you can redistribute it and/or modify
638
+ it under the terms of the GNU General Public License as published by
639
+ the Free Software Foundation, either version 3 of the License, or
640
+ (at your option) any later version.
641
+
642
+ This program is distributed in the hope that it will be useful,
643
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
644
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645
+ GNU General Public License for more details.
646
+
647
+ You should have received a copy of the GNU General Public License
648
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
649
+
650
+ Also add information on how to contact you by electronic and paper mail.
651
+
652
+ If the program does terminal interaction, make it output a short
653
+ notice like this when it starts in an interactive mode:
654
+
655
+ <program> Copyright (C) <year> <name of author>
656
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657
+ This is free software, and you are welcome to redistribute it
658
+ under certain conditions; type `show c' for details.
659
+
660
+ The hypothetical commands `show w' and `show c' should show the appropriate
661
+ parts of the General Public License. Of course, your program's commands
662
+ might be different; for a GUI interface, you would use an "about box".
663
+
664
+ You should also get your employer (if you work as a programmer) or school,
665
+ if any, to sign a "copyright disclaimer" for the program, if necessary.
666
+ For more information on this, and how to apply and follow the GNU GPL, see
667
+ <https://www.gnu.org/licenses/>.
668
+
669
+ The GNU General Public License does not permit incorporating your program
670
+ into proprietary programs. If your program is a subroutine library, you
671
+ may consider it more useful to permit linking proprietary applications with
672
+ the library. If this is what you want to do, use the GNU Lesser General
673
+ Public License instead of this License. But first, please read
674
+ <https://www.gnu.org/licenses/why-not-lgpl.html>.
README.md CHANGED
@@ -1,11 +1,15 @@
1
  ---
2
- title: CaNanoLibrarian
3
- emoji: 📊
4
- colorFrom: green
5
  colorTo: purple
6
  sdk: docker
 
 
7
  pinned: false
8
- license: apache-2.0
9
  ---
 
 
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
1
  ---
2
+ title: caNanoLibrarian
3
+ emoji: 🐳
4
+ colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
8
+ app_file: app.py
9
  pinned: false
 
10
  ---
11
+ # caNanoLibrarian
12
+ This is the backend of the caNanoLibrarian app, which is a LLM based natural language searching experience for a structure database.
13
 
14
+ We have the application on https://cananolibrarian.azurewebsites.net/login, if you want to play with it, the passcode is caNanoLibrarian_DEMO_wkrh_6152023*.
15
+ We have a limited budget for this project, please let us know if you want to continue explore it when the GPT complains about limit. Thanks.
__init__.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+
4
+ dir_path = os.path.abspath(os.getcwd())
5
+ utils_path = dir_path + "\\src\\app_utils"
6
+ database_tool_path = dir_path + "\\src\\database_creation"
7
+ src_path = dir_path + "\\src"
8
+ data_path = dir_path + "\\data"
9
+ sys.path.append(utils_path)
10
+ sys.path.append(src_path)
11
+ sys.path.append(data_path)
12
+ sys.path.append(database_tool_path)
13
+
14
+ # COMPLETIONS_MODEL = "gpt-3.5-turbo"
15
+ # EMBEDDING_MODEL = "text-embedding-ada-002"
16
+ # config_dir = dir_path + "\\src\\utils"
17
+ # config = configparser.ConfigParser()
18
+ # config.read(os.path.join(config_dir, 'gpt_local_config.cfg'))
19
+ # openai.api_key = config.get('token', 'GPT_TOKEN')
app.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import time
4
+ import openai
5
+ import configparser
6
+ import sqlite3
7
+
8
+ dir_path = os.path.abspath(os.getcwd())
9
+
10
+ utils_path = dir_path + "/src/app_utils"
11
+ src_path = dir_path + "/src"
12
+ sys.path.append(utils_path)
13
+ sys.path.append(src_path)
14
+
15
+ from flask import Flask, render_template, request, redirect, url_for, g
16
+ import application_utils as au
17
+
18
+ COMPLETIONS_MODEL = "gpt-3.5-turbo"
19
+ EMBEDDING_MODEL = "text-embedding-ada-002"
20
+ config_dir = utils_path
21
+ config = configparser.ConfigParser()
22
+ config.read(os.path.join(config_dir, 'gpt_local_config.cfg'))
23
+ # openai.api_key = config.get('token', 'GPT_TOKEN')
24
+ openai.api_key = os.environ.get("GPT_TOKEN")
25
+
26
+
27
+ # Specify the path to db file
28
+ db_name = 'caNanoData_Public.db'
29
+
30
+ COMPLETIONS_API_PARAMS = {
31
+ # We use temperature of 0.0 because it gives
32
+ # the most predictable, factual answer.
33
+ "temperature": 0.0,
34
+ "max_tokens": 400,
35
+ "model": "gpt-3.5-turbo"
36
+ }
37
+
38
+ app = Flask("caNanoLibrarian")
39
+
40
+
41
+ def get_db():
42
+ db = getattr(g, '_database', None)
43
+ if db is None:
44
+ db = g._database = sqlite3.connect('caNanoData_Public.db')
45
+ return db
46
+
47
+
48
+ @app.teardown_appcontext
49
+ def close_db(exception):
50
+ db = getattr(g, '_database', None)
51
+ if db is not None:
52
+ db.close()
53
+
54
+
55
+ @app.template_filter('nl2br')
56
+ def nl2br_filter(s):
57
+ return s.replace('\n', '<br>')
58
+
59
+
60
+ @app.route('/', methods=['GET', 'POST'])
61
+ def index():
62
+
63
+ connection = get_db()
64
+
65
+ user_input = ""
66
+ # processed_input = None
67
+ if request.method == 'POST':
68
+ user_input = request.form['user_input']
69
+ result_df, query = au.custom_query(
70
+ user_input, connection,
71
+ GPT4=True,
72
+ print_prompt=False,
73
+ print_query=True,
74
+ print_time=False
75
+ )
76
+ return render_template(
77
+ 'index.html',
78
+ processed_input=result_df.to_html(
79
+ classes='dataframe custom-style',
80
+ index=False),
81
+ source_sections=query,
82
+ user_input=user_input)
83
+
84
+ return render_template('index.html')
85
+
86
+
87
+ if __name__ == '__main__':
88
+ app.run(host='0.0.0.0', port=7860)
environment.yml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: caNanoLibrarian
2
+ channels:
3
+ - conda-forge
4
+ - defaults
5
+ dependencies:
6
+ - python=3.10.9
7
+ - openai=0.27.5
8
+ - numpy=1.24.3
9
+ - pandas=2.0.1
10
+ - tiktoken=0.4.0
11
+ - configparser=5.3.0
12
+ - flask=2.3.2
src/app_utils/application_utils.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sql_utils as su
2
+ import gpt_utils as gu
3
+ import time
4
+
5
+
6
+ def database_structure_strings():
7
+
8
+ NanoEntDes_structure = "{table:NanoEntDes,attributes:" + \
9
+ "[ID(Primary),NanoEntity,Description]}"
10
+
11
+ NanoEntCom_structure = "{table:NanoEntCom,attributes:" + \
12
+ "[ID(Primary),NanoEntity,Composition,CompositionType," + \
13
+ "MolecularWeight,PubChemID]}"
14
+
15
+ FuncEntDes_structure = "{table: FuncEntDes, attributes:" + \
16
+ "[ID(Primary),FunctionEntity,FunctionEntityType," + \
17
+ "Description,ActivationMethod,pubChemID," + \
18
+ "MolarMass,MolarMassUnit]}"
19
+
20
+ FuncEntFunction_structure = "{table: FuncEntFunction," + \
21
+ "attributes:[ID(Primary),FunctionEntiry,Function," + \
22
+ "FunctionDescription]}"
23
+
24
+ ChemAsso_structure = "{table: ChemAsso, attributes:" + \
25
+ "[ID(Primary),AssociationType,BondType,Description," + \
26
+ "dataId,ComposingElementNameA,ComposingElementNameB," + \
27
+ "CompositiontypeB,CompositiontypeA,DomainElementNameB," + \
28
+ "DomainElementNameA,DomainAssociationId,ComposingElemetIdB," + \
29
+ "ComposingElemetIdA,ComposingElementTypeA,EntityDisplayNameB," + \
30
+ "ComposingElementTypeB,EntityDisplayNameA,AttachmentId}"
31
+
32
+ GeneralInfo_structure = "{table:GeneralInfo,attributes:" + \
33
+ "[ID(Primary),sampleName,createdYear,createdMonth]}"
34
+
35
+ SampleKeywords_structure = "{table:SampleKeyWords,attributes:" + \
36
+ "[ID(Primary),sampleName,SampleKeyWord]}"
37
+
38
+ PublicationInfo_structure = "{table:PublicationInfo," + \
39
+ "attributes:[ID(Primary),PMID,year,title,author," + \
40
+ "journal,publicationCategories,description]}"
41
+
42
+ PublicationKeyWords_structure = "{table:PublicationKeyWords," + \
43
+ "attributes:[ID(Primary),sampleName,SampleKeyWord]}"
44
+
45
+ CharacterizationInfo_structure = "{table:CharacterizationInfo," + \
46
+ "attributes:[ID(Primary),CharType,CharName," + \
47
+ "AssayType,Protocol, " + \
48
+ "DesignDescription,AnalysisAndConclusion]}"
49
+
50
+ CharExpConfig_structure = "{table:CharExpConfig," + \
51
+ "attributes:[ID(Primary),CharType,CharName," + \
52
+ "AssayType,ExpConfigTechnique, " + \
53
+ "ExpConfigInstruments,ExpConfigDescription]}"
54
+
55
+ CharResultDescriptions_structure = "{table:CharResultDescriptions," + \
56
+ "attributes:[ID(Primary),CharType,CharName," + \
57
+ "AssayType,CharResultDescription]}"
58
+
59
+ CharResultKeywords_structure = "{table:CharResultKeywords," + \
60
+ "attributes:[ID(Primary),CharType,CharName," + \
61
+ "AssayType,CharResultKeyword]}"
62
+
63
+ CharResultTables_structure = "{table:CharResultTables," + \
64
+ "attributes:[ID(Primary),CharType,CharName," + \
65
+ "AssayType,CharTable]}"
66
+
67
+ note1 = "NanoEntCom should not be used to count unique " + \
68
+ "NanoEntity\ncount composition should only use composition table"
69
+
70
+ note2 = "ALL tables shoud join on ID. Table NanoEntDes and " + \
71
+ "NanoEntCom share key NanoEntity, " + \
72
+ "FuncEntDes and FuncEntFunction share FunctionEntity key, " + \
73
+ "ChemAsso does not have other common keys with other tables."
74
+
75
+ note3 = "Strictly reference to table name and columns in this context. "
76
+
77
+ Overall_structure = [
78
+ "\n",
79
+ NanoEntDes_structure,
80
+ NanoEntCom_structure,
81
+ FuncEntDes_structure,
82
+ FuncEntFunction_structure,
83
+ ChemAsso_structure,
84
+ GeneralInfo_structure,
85
+ SampleKeywords_structure,
86
+ PublicationInfo_structure,
87
+ PublicationKeyWords_structure,
88
+ CharacterizationInfo_structure,
89
+ CharExpConfig_structure,
90
+ CharResultDescriptions_structure,
91
+ CharResultKeywords_structure,
92
+ CharResultTables_structure,
93
+ note1,
94
+ note2,
95
+ note3
96
+ ]
97
+
98
+ Overall_structure_string = "\n".join(Overall_structure)
99
+
100
+ return Overall_structure_string
101
+
102
+
103
+ def custom_query(
104
+ question,
105
+ connection,
106
+ GPT4=True,
107
+ print_prompt=False,
108
+ print_token=False,
109
+ print_query=False,
110
+ print_time=False
111
+ ):
112
+
113
+ stucture = database_structure_strings()
114
+
115
+ context_token_count = gu.num_tokens_from_string(
116
+ stucture
117
+ )
118
+
119
+ prompt = su.sql_prompt(
120
+ question,
121
+ stucture
122
+ )
123
+
124
+ prompt_token_count = gu.num_tokens_from_string(prompt)
125
+
126
+ if print_prompt:
127
+ print(prompt)
128
+
129
+ if print_token:
130
+ print(f"\nContext token count: {context_token_count}")
131
+ print(f"Prompt token count: {prompt_token_count}")
132
+
133
+ start_time = time.time()
134
+
135
+ if GPT4:
136
+ query = gu.quick_ask(prompt, model_num=0)
137
+ else:
138
+ query = gu.quick_ask(prompt, model_num=1)
139
+
140
+ if print_query:
141
+ print("\n============= The Query is: ===============\n")
142
+ print(query)
143
+ print("\n===========================================\n")
144
+
145
+ result_df = su.submit_querry(query, connection)
146
+ # End the timer
147
+ end_time = time.time()
148
+ # Calculate the execution time
149
+ execution_time = end_time - start_time
150
+ # Print the execution time
151
+ if print_time:
152
+ print("\nExecution Time:", execution_time, "seconds")
153
+
154
+ return result_df, query
src/app_utils/gpt_local_config.cfg ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ [token]
2
+ GPT_TOKEN =
3
+ [model]
4
+ model_for_fine_tune = davinci
5
+ model_for_chat = gpt-3.5-turbo
src/app_utils/gpt_utils.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import configparser
2
+ import os
3
+ import openai
4
+ import tiktoken
5
+ import os.path
6
+
7
+
8
+ config_dir = os.path.dirname(__file__)
9
+ config = configparser.ConfigParser()
10
+ config.read(os.path.join(config_dir, 'gpt_local_config.cfg'))
11
+
12
+ # openai.api_key = config.get('token', 'GPT_TOKEN')
13
+ openai.api_key = os.environ.get("GPT_TOKEN")
14
+
15
+ model_for_chat = config.get('model', 'model_for_chat')
16
+
17
+
18
+ # https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
19
+ def num_tokens_from_string(string: str, model="gpt-3.5-turbo") -> int:
20
+ """Returns the number of tokens in a text string."""
21
+ encoding = tiktoken.encoding_for_model(model)
22
+ num_tokens = len(encoding.encode(string))
23
+
24
+ return num_tokens
25
+
26
+
27
+ def quick_ask(prompt,
28
+ model_num=1,
29
+ max_tokens=500):
30
+
31
+ model = ["gpt-4-0314",
32
+ "gpt-3.5-turbo",
33
+ "gpt-4-32k",
34
+ "gpt-3.5-turbo-16k"]
35
+
36
+ response = openai.ChatCompletion.create(
37
+ model=model[model_num],
38
+ messages=[{"role": "user",
39
+ "content": prompt}],
40
+ temperature=0,
41
+ max_tokens=max_tokens
42
+ )
43
+
44
+ return response.choices[0]['message']['content']
src/app_utils/sql_utils.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import sqlite3
3
+
4
+
5
+ def show_table_columns(connection, table):
6
+
7
+ query = f"PRAGMA table_info({table});"
8
+ table_columns = simple_querry(
9
+ connection,
10
+ query
11
+ )
12
+
13
+ for item in table_columns:
14
+ print(f"Column: {item[1]}, Type: {item[2]}")
15
+
16
+
17
+ def construct_db(
18
+ db_name,
19
+ overview_dict
20
+ ):
21
+
22
+ # Create a SQLite database connection
23
+ conn = sqlite3.connect(db_name)
24
+
25
+ for key in overview_dict:
26
+ overview_dict[key].to_sql(
27
+ key,
28
+ conn,
29
+ if_exists='replace',
30
+ index=False)
31
+
32
+ # Close the database connection
33
+ conn.close()
34
+
35
+ return "Done"
36
+
37
+
38
+ def submit_querry(
39
+ query,
40
+ connection
41
+ ):
42
+
43
+ try:
44
+ # Create a cursor object to execute SQL queries
45
+ cursor = connection.cursor()
46
+
47
+ # Execute the SQL query
48
+ cursor.execute(query)
49
+
50
+ # Fetch the column names
51
+ column_names = [description[0] for description in cursor.description]
52
+
53
+ # Fetch all the results
54
+ results = cursor.fetchall()
55
+
56
+ # Combine the column names with the query result
57
+ # header = ','.join(column_names)
58
+
59
+ # Close the cursor and the connection
60
+ cursor.close()
61
+
62
+ # Create a DataFrame from the results
63
+ df = pd.DataFrame(results, columns=column_names)
64
+
65
+ # Return the DataFrame
66
+ return df
67
+
68
+ except Exception as e:
69
+ # Return the error message if an exception occurs
70
+ error_message = str(e)
71
+ df = pd.DataFrame({'Error': [error_message]})
72
+ return df
73
+
74
+
75
+ def sql_prompt(question, stucture):
76
+ header = "select appropriate table(s), write me a sql query to:\n"
77
+ tail = " return sql query only."
78
+
79
+ prompt = header + \
80
+ question + " " + \
81
+ "\n\n(context: the table structure is: " + \
82
+ stucture + ")" + \
83
+ tail
84
+
85
+ return prompt
86
+
87
+
88
+ def simple_querry(connection, querry):
89
+ cursor = connection.cursor()
90
+
91
+ # Execute the SQL query
92
+ cursor.execute(querry)
93
+
94
+ # Fetch all the results
95
+ results = cursor.fetchall()
96
+
97
+ return results
98
+
99
+
100
+ def show_tables(connection):
101
+ show_tables = "SELECT name\nFROM sqlite_master\nWHERE type = 'table'"
102
+ tables = simple_querry(connection, show_tables)
103
+ print(tables)
src/database_creation/characterization_info_creation.py ADDED
@@ -0,0 +1,404 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import csv
2
+ import sql_utils as su
3
+ from tqdm.notebook import tqdm_notebook
4
+
5
+
6
+ def create_characterization_dt_tables(connection):
7
+
8
+ CharacterizationInfo_str = "CREATE TABLE IF NOT EXISTS " + \
9
+ "CharacterizationInfo " + \
10
+ "(ID INT, CharType VARCHAR(150), CharName VARCHAR(150), " + \
11
+ "AssayType VARCHAR(150), Protocol TEXT, " + \
12
+ "DesignDescription TEXT, AnalysisAndConclusion TEXT);"
13
+
14
+ CharExpConfig_str = "CREATE TABLE IF NOT EXISTS " + \
15
+ "CharExpConfig " + \
16
+ "(ID INT, CharType VARCHAR(150), CharName VARCHAR(150), " + \
17
+ "AssayType VARCHAR(150), ExpConfigTechnique TEXT, " + \
18
+ "ExpConfigInstruments TEXT, ExpConfigDescription TEXT);"
19
+
20
+ CharResultDescriptions_str = "CREATE TABLE IF NOT EXISTS " + \
21
+ "CharResultDescriptions " + \
22
+ "(ID INT, CharType VARCHAR(150), CharName VARCHAR(150), " + \
23
+ "AssayType VARCHAR(150), CharResultDescription TEXT);"
24
+
25
+ CharResultKeywords_str = "CREATE TABLE IF NOT EXISTS " + \
26
+ "CharResultKeywords " + \
27
+ "(ID INT, CharType VARCHAR(150), CharName VARCHAR(150), " + \
28
+ "AssayType VARCHAR(150), CharResultKeyword VARCHAR(150)); "
29
+
30
+ CharResultTables_str = "CREATE TABLE IF NOT EXISTS " + \
31
+ "CharResultTables " + \
32
+ "(ID INT, CharType VARCHAR(150), CharName VARCHAR(150), " + \
33
+ "AssayType VARCHAR(150), CharTable TEXT); "
34
+
35
+ table_creation_querys = [
36
+ CharacterizationInfo_str,
37
+ CharExpConfig_str,
38
+ CharResultDescriptions_str,
39
+ CharResultKeywords_str,
40
+ CharResultTables_str
41
+ ]
42
+
43
+ for query in table_creation_querys:
44
+ results = su.simple_querry(
45
+ connection,
46
+ query
47
+ )
48
+ if len(results) == 0:
49
+ print("Table Exists")
50
+ else:
51
+ print(results)
52
+
53
+ su.show_tables(connection)
54
+
55
+
56
+ def characterization_to_sql(
57
+ characterization_dt,
58
+ connection
59
+ ):
60
+
61
+ total_ids = len(characterization_dt)
62
+ no_sample_list = []
63
+ with tqdm_notebook(
64
+ total=total_ids,
65
+ desc='Processing',
66
+ unit='ID'
67
+ ) as progress_bar:
68
+ cursor = connection.cursor()
69
+
70
+ for ID in characterization_dt:
71
+ sample_info = characterization_dt[ID]
72
+
73
+ for CharType_info in sample_info:
74
+ if 'type' not in CharType_info:
75
+ print('There is no characterization with your sample.')
76
+ no_sample_list.append([ID, CharType_info])
77
+ continue
78
+ CharType = CharType_info['type']
79
+ CharType_Info_Assay = CharType_info['charsByAssayType']
80
+ for CharName in CharType_Info_Assay:
81
+ CharName_infos = CharType_Info_Assay[CharName]
82
+ for charname_info in CharName_infos:
83
+ displayableItems = charname_info['displayableItems']
84
+ # Create a cursor object
85
+ for item in displayableItems:
86
+ if item['name'] == 'Assay Type':
87
+ AssayType = item['value']
88
+
89
+ if item['name'] == 'Protocol':
90
+ Protocol = item['value']
91
+
92
+ if item['name'] == 'Design Description':
93
+ DesignDescription = item['value']
94
+
95
+ if item['name'] == 'Experiment Configurations':
96
+ exp_config_list = item['value']
97
+
98
+ if item['name'] == 'Characterization Results':
99
+ char_results = item['value']
100
+
101
+ if item['name'] == 'Analysis and Conclusion':
102
+ AnalysisAndConclusion = item['value']
103
+
104
+ n_row = len(exp_config_list[0]['Technique'])
105
+ for i in range(n_row):
106
+ ExpConfigTechnique = exp_config_list[
107
+ 0
108
+ ]['Technique'][i]
109
+
110
+ ExpConfigInstruments = exp_config_list[
111
+ 1
112
+ ]['Instruments'][i]
113
+
114
+ ExpConfigDescription = exp_config_list[
115
+ 2
116
+ ]['Description'][i]
117
+
118
+ # Write to NanoEntDes
119
+ CharExpConfig_insert = (
120
+ ID,
121
+ CharType,
122
+ CharName,
123
+ AssayType,
124
+ ExpConfigTechnique,
125
+ ExpConfigInstruments,
126
+ ExpConfigDescription
127
+ )
128
+
129
+ # Execute a SELECT statement to check
130
+ # if the entry already exists
131
+
132
+ search_query = "SELECT COUNT(*) FROM " + \
133
+ "CharExpConfig" + \
134
+ " WHERE ID = ?" + \
135
+ " AND CharType = ?" + \
136
+ " AND CharName = ?" + \
137
+ " AND AssayType = ?" + \
138
+ " AND ExpConfigTechnique = ?" + \
139
+ " AND ExpConfigInstruments = ?;"
140
+
141
+ cursor.execute(
142
+ search_query,
143
+ CharExpConfig_insert[:6]
144
+ )
145
+ count = cursor.fetchone()[0]
146
+
147
+ # Check the count to determine
148
+ # if the entry exists
149
+ if count == 0:
150
+ # Entry does not exist, proceed
151
+ # with insertion
152
+ insert_query = "INSERT INTO " + \
153
+ "CharExpConfig " + \
154
+ "(ID, " + \
155
+ "CharType, " + \
156
+ "CharName, " + \
157
+ "AssayType, " + \
158
+ "ExpConfigTechnique, " + \
159
+ "ExpConfigInstruments, " + \
160
+ "ExpConfigDescription) " + \
161
+ "VALUES (?, ?, ?, ?, ?, ?, ?)"
162
+ cursor.execute(
163
+ insert_query,
164
+ CharExpConfig_insert
165
+ )
166
+ connection.commit()
167
+
168
+ for char_result in char_results:
169
+ if 'Data and Conditions' in char_result:
170
+ table_list = char_result[
171
+ 'Data and Conditions'
172
+ ]
173
+ CharTable = ""
174
+ for item in table_list:
175
+ tsc = ",".join(
176
+ item[
177
+ 'value'
178
+ ]
179
+ )
180
+ CharTable += tsc
181
+ CharTable += ";"
182
+ # Write to NanoEntDes
183
+ CRTables_insert = (
184
+ ID,
185
+ CharType,
186
+ CharName,
187
+ AssayType,
188
+ CharTable
189
+ )
190
+
191
+ # Execute a SELECT
192
+ # statement to check
193
+ # if the entry already exists
194
+
195
+ search_query = "SELECT " + \
196
+ "COUNT(*) FROM " + \
197
+ "CharResultTables" + \
198
+ " WHERE ID = ?" + \
199
+ " AND CharType = ?" + \
200
+ " AND CharName = ?" + \
201
+ " AND AssayType = ?;"
202
+
203
+ cursor.execute(
204
+ search_query,
205
+ CRTables_insert[:4]
206
+ )
207
+ count = cursor.fetchone()[0]
208
+
209
+ # Check the count to determine
210
+ # if the entry exists
211
+ if count == 0:
212
+ # Entry does not exist, proceed
213
+ # with insertion
214
+ insert_query = "INSERT " + \
215
+ "INTO " + \
216
+ "CharResultTables " + \
217
+ "(ID, " + \
218
+ "CharType, " + \
219
+ "CharName, " + \
220
+ "AssayType, " + \
221
+ "CharTable) " + \
222
+ "VALUES (?, ?, ?, ?, ?)"
223
+ cursor.execute(
224
+ insert_query,
225
+ CRTables_insert
226
+ )
227
+ connection.commit()
228
+ if 'Files' in char_result:
229
+ file_list = char_result[
230
+ 'Files'
231
+ ]
232
+
233
+ for char_file in file_list:
234
+ if 'description' in char_file:
235
+ CRDes = char_file[
236
+ 'description'
237
+ ]
238
+ else:
239
+ CRDes = "None"
240
+
241
+ if 'keywordsString' in char_file:
242
+ CRKWstr_ls = char_file[
243
+ 'keywordsString'
244
+ ].split(",")
245
+ else:
246
+ CRKWstr_ls = list("None")
247
+
248
+ CRDes_insert = (
249
+ ID,
250
+ CharType,
251
+ CharName,
252
+ AssayType,
253
+ CRDes
254
+ )
255
+
256
+ # Execute a SELECT
257
+ # statement to check
258
+ # if the entry already exists
259
+
260
+ search_query = "SELECT " + \
261
+ "COUNT(*) FROM " + \
262
+ "CharResultDescriptions" + \
263
+ " WHERE ID = ?" + \
264
+ " AND CharType = ?" + \
265
+ " AND CharName = ?" + \
266
+ " AND AssayType = ?;"
267
+
268
+ cursor.execute(
269
+ search_query,
270
+ CRDes_insert[:4]
271
+ )
272
+ count = cursor.fetchone()[0]
273
+
274
+ # Check the count to determine
275
+ # if the entry exists
276
+ if count == 0:
277
+ # Entry does not exist, proceed
278
+ # with insertion
279
+ insert_query = "INSERT " + \
280
+ "INTO " + \
281
+ "CharResultDescriptions" +\
282
+ " (ID, " + \
283
+ "CharType, " + \
284
+ "CharName, " + \
285
+ "AssayType, " + \
286
+ "CharResultDescription" + \
287
+ ") " + \
288
+ "VALUES (?, ?, ?, ?, ?)"
289
+
290
+ cursor.execute(
291
+ insert_query,
292
+ CRDes_insert
293
+ )
294
+ connection.commit()
295
+
296
+ for CRKW in CRKWstr_ls:
297
+ CRKW_insert = (
298
+ ID,
299
+ CharType,
300
+ CharName,
301
+ AssayType,
302
+ CRKW
303
+ )
304
+
305
+ # Execute a SELECT
306
+ # statement to check
307
+ # if the entry already exists
308
+
309
+ search_query = "SELECT " + \
310
+ "COUNT(*) FROM " + \
311
+ "CharResultKeywords" + \
312
+ " WHERE ID = ?" + \
313
+ " AND CharType = ?" + \
314
+ " AND CharName = ?" + \
315
+ " AND AssayType = ?" + \
316
+ " AND " + \
317
+ "CharResultKeyword = ?;"
318
+
319
+ cursor.execute(
320
+ search_query,
321
+ CRKW_insert
322
+ )
323
+ count = cursor.fetchone()[0]
324
+
325
+ # Check the count to determine
326
+ # if the entry exists
327
+ if count == 0:
328
+ # Entry does not exist,
329
+ # proceed
330
+ # with insertion
331
+ insert_query = "INSERT " +\
332
+ "INTO " + \
333
+ "CharResultKeywords" +\
334
+ " (ID, " + \
335
+ "CharType, " + \
336
+ "CharName, " + \
337
+ "AssayType, " + \
338
+ "CharResultKeyword" + \
339
+ ") " + \
340
+ "VALUES (?, ?, " + \
341
+ "?, ?, ?)"
342
+
343
+ cursor.execute(
344
+ insert_query,
345
+ CRKW_insert
346
+ )
347
+ connection.commit()
348
+
349
+ # Write to NanoEntDes
350
+ CharacterizationInfo_insert = (
351
+ ID,
352
+ CharType,
353
+ CharName,
354
+ AssayType,
355
+ Protocol,
356
+ DesignDescription,
357
+ AnalysisAndConclusion
358
+ )
359
+
360
+ # Execute a SELECT statement to check
361
+ # if the entry already exists
362
+
363
+ search_query = "SELECT COUNT(*) FROM " + \
364
+ "CharacterizationInfo" + \
365
+ " WHERE ID = ?" + \
366
+ " AND CharType = ?" + \
367
+ " AND CharName = ?" + \
368
+ " AND AssayType = ?" + \
369
+ " AND Protocol = ?;"
370
+
371
+ cursor.execute(
372
+ search_query,
373
+ CharacterizationInfo_insert[:5]
374
+ )
375
+ count = cursor.fetchone()[0]
376
+
377
+ # Check the count to determine if the entry exists
378
+ if count == 0:
379
+ # Entry does not exist, proceed with insertion
380
+ insert_query = "INSERT INTO " + \
381
+ "CharacterizationInfo " + \
382
+ "(ID, " + \
383
+ "CharType, " + \
384
+ "CharName, " + \
385
+ "AssayType, " + \
386
+ "Protocol, " + \
387
+ "DesignDescription, " + \
388
+ "AnalysisAndConclusion) " + \
389
+ "VALUES (?, ?, ?, ?, ?, ?, ?)"
390
+ cursor.execute(
391
+ insert_query,
392
+ CharacterizationInfo_insert
393
+ )
394
+ connection.commit()
395
+ progress_bar.update(1)
396
+ cursor.close()
397
+ no_sample_csv = 'no_characterization.csv'
398
+ # Specify the filename for the CSV file
399
+
400
+ with open(no_sample_csv, 'w', newline='') as file:
401
+ writer = csv.writer(file)
402
+ writer.writerows(no_sample_list)
403
+
404
+ print(f"CSV file '{no_sample_csv}' has been created.")
src/database_creation/chemical_association_creation.py ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import data_utils as du
2
+ import sql_utils as su
3
+ from tqdm.notebook import tqdm_notebook
4
+
5
+
6
+ def create_ChemAsso_tables(connection):
7
+
8
+ table_String_ChemAssoName = "CREATE TABLE IF NOT EXISTS ChemAsso (" + \
9
+ "ID INT, " + \
10
+ "AssociationType VARCHAR(150), " + \
11
+ "BondType VARCHAR(150), " + \
12
+ "Description TEXT, " + \
13
+ "dataId INT, " + \
14
+ "ComposingElementNameA VARCHAR(150), " + \
15
+ "ComposingElementNameB VARCHAR(150), " + \
16
+ "CompositiontypeB VARCHAR(150), " + \
17
+ "CompositiontypeA VARCHAR(150), " + \
18
+ "DomainElementNameB VARCHAR(150), " + \
19
+ "DomainElementNameA VARCHAR(150), " + \
20
+ "DomainAssociationId INT, " + \
21
+ "ComposingElemetIdB INT, " + \
22
+ "ComposingElemetIdA INT, " + \
23
+ "ComposingElementTypeA VARCHAR(150), " + \
24
+ "EntityDisplayNameB VARCHAR(150), " + \
25
+ "ComposingElementTypeB VARCHAR(150), " + \
26
+ "EntityDisplayNameA VARCHAR(150), " + \
27
+ "AttachmentId INT);"
28
+
29
+ table_creation_querys = [
30
+ table_String_ChemAssoName
31
+ ]
32
+
33
+ for query in table_creation_querys:
34
+ results = su.simple_querry(
35
+ connection,
36
+ query
37
+ )
38
+ if len(results) == 0:
39
+ print("Table Exists")
40
+ else:
41
+ print(results)
42
+ su.show_tables(connection)
43
+
44
+
45
+ def chemicalassociation_to_sql(
46
+ composition_dt,
47
+ connection
48
+ ):
49
+
50
+ chemicalassociation_dt = {}
51
+
52
+ du.parse_dictionary(
53
+ composition_dt,
54
+ "chemicalassociation",
55
+ chemicalassociation_dt
56
+ )
57
+
58
+ total_ids = len(chemicalassociation_dt)
59
+
60
+ with tqdm_notebook(
61
+ total=total_ids,
62
+ desc='Processing',
63
+ unit='ID'
64
+ ) as progress_bar:
65
+
66
+ # Create a cursor object
67
+ cursor = connection.cursor()
68
+
69
+ for ID in chemicalassociation_dt:
70
+ sample_info = chemicalassociation_dt[ID]
71
+ for AssociationType in sample_info:
72
+ AssociationType_info = sample_info[AssociationType]
73
+ for Entry in AssociationType_info:
74
+ # Create a cursor object
75
+ elements = Entry['AssocitedElements']
76
+
77
+ # Write to NanoEntCom
78
+ ChemAsso_insert = (
79
+ ID,
80
+ AssociationType,
81
+ Entry['BondType'],
82
+ Entry['Description'],
83
+ Entry['dataId'],
84
+ elements['ComposingElementNameA'],
85
+ elements['ComposingElementNameB'],
86
+ elements['CompositiontypeB'],
87
+ elements['CompositiontypeA'],
88
+ elements['DomainElementNameB'],
89
+ elements['DomainElementNameA'],
90
+ elements['DomainAssociationId'],
91
+ elements['ComposingElemetIdB'],
92
+ elements['ComposingElemetIdA'],
93
+ elements['ComposingElementTypeA'],
94
+ elements['EntityDisplayNameB'],
95
+ elements['ComposingElementTypeB'],
96
+ elements['EntityDisplayNameA'],
97
+ Entry['AttachmentId']
98
+ )
99
+
100
+ # Execute a SELECT statement to check
101
+ # if the entry already exists
102
+ search_query = "SELECT COUNT(*) " + \
103
+ "FROM ChemAsso WHERE " + \
104
+ "ID = ? AND AssociationType = ? " + \
105
+ "AND BondType = ? " + \
106
+ "AND Description = ? " + \
107
+ "AND dataId = ?;"
108
+
109
+ cursor.execute(search_query, ChemAsso_insert[:5])
110
+ count = cursor.fetchone()[0]
111
+
112
+ # Check the count to determine if the entry exists
113
+ if count == 0:
114
+ # Entry does not exist, proceed with insertion
115
+ insert_query = "INSERT INTO ChemAsso (" + \
116
+ "ID, " + \
117
+ "AssociationType, " + \
118
+ "BondType, " + \
119
+ "Description, " + \
120
+ "dataId, " + \
121
+ "ComposingElementNameA, " + \
122
+ "ComposingElementNameB, " + \
123
+ "CompositiontypeB, " + \
124
+ "CompositiontypeA, " + \
125
+ "DomainElementNameB, " + \
126
+ "DomainElementNameA, " + \
127
+ "DomainAssociationId, " + \
128
+ "ComposingElemetIdB, " + \
129
+ "ComposingElemetIdA, " + \
130
+ "ComposingElementTypeA, " + \
131
+ "EntityDisplayNameB, " + \
132
+ "ComposingElementTypeB, " + \
133
+ "EntityDisplayNameA, " + \
134
+ "AttachmentId) " + \
135
+ "VALUES (?, ?, ?, ?, " + \
136
+ "?, ?, ?, ?, ?, ?, ?, ?, " + \
137
+ "?, ?, ?, ?, ?, ? , ?)"
138
+
139
+ cursor.execute(
140
+ insert_query,
141
+ ChemAsso_insert
142
+ )
143
+
144
+ connection.commit()
145
+ # else:
146
+ # # Entry already exists, skip
147
+ # print("Entry already exists, skipping...")
148
+ progress_bar.update(1)
149
+ cursor.close()
src/database_creation/data_utils.py ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import ast
3
+ import csv
4
+ import pickle
5
+ import os
6
+ import json
7
+
8
+
9
+ def subset_and_unwrapp(
10
+ df,
11
+ subset_list,
12
+ target_column
13
+ ):
14
+
15
+ subset_df = df.loc[:, subset_list].copy()
16
+
17
+ subset_df[target_column] = subset_df[target_column].apply(
18
+ lambda x: ast.literal_eval(x) if isinstance(
19
+ x,
20
+ str
21
+ ) and x != 'nan' else np.nan)
22
+
23
+ subset_df = subset_df.explode(target_column)
24
+
25
+ subset_df = subset_df.reset_index(drop=True)
26
+
27
+ return subset_df
28
+
29
+
30
+ def parse_overview_table(df_in):
31
+
32
+ columns = df_in.columns.to_list()
33
+
34
+ # Find columns that have string values representing lists
35
+ columns_with_lists = []
36
+ for column in df_in.columns:
37
+ if df_in[column].apply(
38
+ lambda x: isinstance(
39
+ x,
40
+ str
41
+ ) and x.startswith(
42
+ '['
43
+ ) and x.endswith(
44
+ ']')
45
+ ).any():
46
+
47
+ columns_with_lists.append(column)
48
+
49
+ overview_dict = {}
50
+ table_structure = ""
51
+
52
+ for column in columns[1:]:
53
+ if column in columns_with_lists:
54
+ temp_df = subset_and_unwrapp(
55
+ df_in,
56
+ [columns[0], column],
57
+ column
58
+ )
59
+
60
+ else:
61
+ temp_df = df_in[[columns[0], column]]
62
+
63
+ temp_key = "ID" + column
64
+
65
+ overview_dict[temp_key] = temp_df
66
+
67
+ table_text = "{table:" + temp_key + ","
68
+ column_text = "columns:[sampleId," + column + "]};"
69
+ entry = table_text + column_text
70
+ table_structure += entry
71
+
72
+ print(overview_dict.keys())
73
+
74
+ return overview_dict, table_structure
75
+
76
+
77
+ def get_unique_value_list(sample_search_df,
78
+ key,
79
+ file_path,
80
+ run=False):
81
+ if run:
82
+ com_list = sample_search_df[key].unique()
83
+ com_par_list = []
84
+ for item in com_list:
85
+ if item is not np.nan:
86
+ python_list = ast.literal_eval(item)
87
+ com_par_list.extend(python_list)
88
+ else:
89
+ com_par_list.append("NULL")
90
+
91
+ com_par_list = list(set(com_par_list))
92
+
93
+ if key in com_par_list:
94
+ com_par_list.remove(key)
95
+
96
+ # Save unique values to CSV
97
+ with open(file_path, 'w', newline='') as file:
98
+ writer = csv.writer(file)
99
+ writer.writerows([[value] for value in com_par_list])
100
+
101
+ print(f"Unique values for {key} saved to", file_path)
102
+ return com_par_list
103
+
104
+ else:
105
+ print(f"Skipeed for {key}")
106
+ com_par_list = []
107
+
108
+ with open(file_path, 'r') as file:
109
+ reader = csv.reader(file)
110
+ for row in reader:
111
+ com_par_list.append(row[0])
112
+
113
+ return com_par_list
114
+
115
+
116
+ def extract_unique_options(
117
+ sample_search_df,
118
+ data_path,
119
+ run=False):
120
+
121
+ compo_file_path = data_path + "\\composition_list.csv"
122
+ compositions = get_unique_value_list(
123
+ sample_search_df,
124
+ 'composition',
125
+ compo_file_path,
126
+ run=run)
127
+
128
+ functions_file_path = data_path + "\\functions_list.csv"
129
+ functions = get_unique_value_list(
130
+ sample_search_df,
131
+ 'functions',
132
+ functions_file_path,
133
+ run=run)
134
+
135
+ characterizations_file_path = data_path + "\\characterizations_list.csv"
136
+ characterizations = get_unique_value_list(
137
+ sample_search_df,
138
+ 'characterizations',
139
+ characterizations_file_path,
140
+ run=run)
141
+
142
+ return {
143
+ "compositions": compositions,
144
+ "functions": functions,
145
+ "characterizations": characterizations
146
+ }
147
+
148
+
149
+ def load_sample_list(
150
+ output_data_dir,
151
+ read_sample,
152
+ n_to_read,
153
+ data_path
154
+ ):
155
+ sample_list_all_path = data_path + "\\sample_list_all.pickle"
156
+ Json_list = os.listdir(output_data_dir)
157
+ if read_sample:
158
+
159
+ sample_list = []
160
+ for i in range(0, n_to_read):
161
+ file_path = os.path.join(
162
+ output_data_dir,
163
+ Json_list[i])
164
+
165
+ with open(file_path, 'r') as file:
166
+ sample_list.append(json.load(file))
167
+
168
+ with open(sample_list_all_path, 'wb') as file:
169
+ pickle.dump(sample_list, file)
170
+ else:
171
+ with open(sample_list_all_path, 'rb') as file:
172
+ sample_list = pickle.load(file)
173
+
174
+ return sample_list
175
+
176
+
177
+ def parse_dictionary(source_dictionary,
178
+ key,
179
+ new_dictionary):
180
+ for entry in source_dictionary:
181
+ new_dictionary[str(entry)] = source_dictionary[entry][key]
182
+
183
+
184
+ def parse_overview_raw_data(
185
+ sample_search_df,
186
+ overview_data_path,
187
+ re_parse_overview=False
188
+ ):
189
+
190
+ overview_dict_path = overview_data_path + "\\overview_dict.pickle"
191
+ overview_table_path = overview_data_path + "\\overview_table.pickle"
192
+ if re_parse_overview:
193
+ overview_dict, table_structure = parse_overview_table(sample_search_df)
194
+
195
+ with open(overview_dict_path, 'wb') as file:
196
+ pickle.dump(overview_dict, file)
197
+
198
+ with open(overview_table_path, 'wb') as file:
199
+ pickle.dump(table_structure, file)
200
+ else:
201
+ with open(overview_dict_path, 'rb') as file:
202
+ overview_dict = pickle.load(file)
203
+
204
+ with open(overview_table_path, 'rb') as file:
205
+ table_structure = pickle.load(file)
206
+
207
+ return overview_dict, table_structure
src/database_creation/database_creation.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import data_utils as du
2
+ import nanomaterial_entity_creation as nec
3
+ import functionalizing_entity_creation as fec
4
+ import chemical_association_creation as cac
5
+ import general_info_creation as gic
6
+ import publication_info_creation as pic
7
+ import characterization_info_creation as cic
8
+ import sqlite3
9
+
10
+
11
+ def SQLdb_creation(
12
+ db_name,
13
+ sample_search_df,
14
+ overview_data_path,
15
+ sample_list,
16
+ connection,
17
+ re_parse=False
18
+ ):
19
+ """
20
+ Please note that this portion of the script is to
21
+ show how the database was created from the existing pulic
22
+ data download from caNanoLab and for the community
23
+ to inspect of any misconfiguration in the setup
24
+ of the backing database for the caNanoLibrarian App.
25
+ This is not intended to encourage public to ramp
26
+ downloading request to caNanoLab, please use the database
27
+ responsibly. Thank you.
28
+ """
29
+ table_names = []
30
+ table_schemas = []
31
+ resulting_schema_dict = {}
32
+
33
+ if re_parse:
34
+ overview_dict, table_structure = du.parse_overview_raw_data(
35
+ sample_search_df,
36
+ overview_data_path,
37
+ re_parse_overview=False
38
+ )
39
+
40
+ composition_dt = {}
41
+ characterization_dt = {}
42
+ publication_dt = {}
43
+ contact_dt = {}
44
+ for sample in sample_list:
45
+ sampleID = str(sample['sampleID'])
46
+ composition_dt[sampleID] = sample['composition']
47
+ characterization_dt[sampleID] = sample['characterization']
48
+ publication_dt[sampleID] = sample['publication']
49
+ contact_dt[sampleID] = sample['contact']
50
+
51
+ connection = sqlite3.connect(db_name)
52
+
53
+ print("Writing GeneralInfo...")
54
+ gic.create_general_info_tables(connection)
55
+ gic.general_info_to_sql(contact_dt, connection)
56
+
57
+ print("Writing NanoMaterialEntity...")
58
+ nec.create_nanomaterial_entity_tables(connection)
59
+ nec.nanomaterialentity_to_sql(composition_dt, connection)
60
+
61
+ print("Writing FunctionalizingEntity...")
62
+ fec.create_functionalizing_entity_tables(connection)
63
+ fec.functionalizingentity_to_sql(composition_dt, connection)
64
+
65
+ print("Writing ChemicalAssociation...")
66
+ cac.create_ChemAsso_tables(connection)
67
+ cac.chemicalassociation_to_sql(composition_dt, connection)
68
+
69
+ print("Writing Characteriztion...")
70
+ cic.create_characterization_dt_tables(connection)
71
+ cic.characterization_to_sql(characterization_dt, connection)
72
+
73
+ print("Writing Publication...")
74
+ pic.create_Publication_Info_tables(connection)
75
+ pic.publication_info_to_sql(publication_dt, connection)
76
+
77
+ print("Done!")
78
+ else:
79
+ cursor = connection.cursor()
80
+ # Get the list of tables in the database
81
+ cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
82
+ tables = cursor.fetchall()
83
+ # Print the schema of each table
84
+ for table in tables:
85
+ table_name = table[0]
86
+ table_names.append(table_name)
87
+ # print(f"Table: {table_name}")
88
+ # print("Schema:")
89
+ cursor.execute(f"PRAGMA table_info({table_name});")
90
+ schema = cursor.fetchall()
91
+ current_schema = []
92
+ for column in schema:
93
+ current_schema.append(f"{column[1]}: {column[2]}")
94
+ table_schemas.append(current_schema)
95
+ resulting_schema_dict[table_name] = current_schema
96
+
97
+ # Close the cursor and the database connection
98
+ cursor.close()
99
+
100
+ return resulting_schema_dict
src/database_creation/functionalizing_entity_creation.py ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import data_utils as du
2
+ import sql_utils as su
3
+ from tqdm.notebook import tqdm_notebook
4
+
5
+
6
+ def create_functionalizing_entity_tables(connection):
7
+
8
+ FuncEntDes_str = "CREATE TABLE IF NOT EXISTS FuncEntDes " + \
9
+ "(ID INT, FunctionEntity VARCHAR(255), FunctionEntityType " + \
10
+ "VARCHAR(150), Description TEXT, ActivationMethod TEXT," + \
11
+ " pubChemID VARCHAR(50), MolarMass VARCHAR(100)," + \
12
+ " MolarMassUnit VARCHAR(100));"
13
+
14
+ FuncEntFunction_str = "CREATE TABLE IF NOT EXISTS FuncEntFunction " + \
15
+ "(ID INT, FunctionEntity VARCHAR(255), Function VARCHAR(255), " + \
16
+ "FunctionDescription TEXT);"
17
+
18
+ table_creation_querys = [
19
+ FuncEntDes_str,
20
+ FuncEntFunction_str
21
+ ]
22
+
23
+ for query in table_creation_querys:
24
+ results = su.simple_querry(
25
+ connection,
26
+ query
27
+ )
28
+ if len(results) == 0:
29
+ print("Table Exists")
30
+ else:
31
+ print(results)
32
+ su.show_tables(connection)
33
+
34
+
35
+ def functionalizingentity_to_sql(
36
+ composition_dt,
37
+ connection
38
+ ):
39
+
40
+ functionalizingentity_dt = {}
41
+
42
+ du.parse_dictionary(
43
+ composition_dt,
44
+ "functionalizingentity",
45
+ functionalizingentity_dt
46
+ )
47
+
48
+ total_ids = len(functionalizingentity_dt)
49
+
50
+ with tqdm_notebook(
51
+ total=total_ids,
52
+ desc='Processing',
53
+ unit='ID'
54
+ ) as progress_bar:
55
+
56
+ for ID in functionalizingentity_dt:
57
+ sample_info = functionalizingentity_dt[ID]
58
+ for FuncEntity in sample_info:
59
+ FuncEntity_info = sample_info[FuncEntity]
60
+ for Entry in FuncEntity_info:
61
+ # Create a cursor object
62
+ cursor = connection.cursor()
63
+
64
+ # Write to NanoEntDes
65
+ FuncEntDes_insert = (
66
+ ID,
67
+ Entry['Name'],
68
+ FuncEntity,
69
+ Entry['description'],
70
+ Entry['ActivationMethod'],
71
+ Entry['pubChemID'],
72
+ Entry['value'],
73
+ Entry['valueUnit']
74
+ )
75
+
76
+ # Execute a SELECT statement to check
77
+ # if the entry already exists
78
+
79
+ search_query = "SELECT COUNT(*) FROM FuncEntDes" + \
80
+ " WHERE ID = ? AND FunctionEntity = ?" + \
81
+ " AND FunctionEntityType = ?" + \
82
+ " AND Description = ?" + \
83
+ " AND ActivationMethod = ?" + \
84
+ " AND pubChemID = ?" + \
85
+ " AND MolarMass = ?" + \
86
+ " AND MolarMassUnit = ?;"
87
+
88
+ cursor.execute(search_query, FuncEntDes_insert)
89
+ count = cursor.fetchone()[0]
90
+
91
+ # Check the count to determine if the entry exists
92
+ if count == 0:
93
+ # Entry does not exist, proceed with insertion
94
+ insert_query = "INSERT INTO FuncEntDes " + \
95
+ "(ID, FunctionEntity, " + \
96
+ "FunctionEntityType, " + \
97
+ "Description, " + \
98
+ "ActivationMethod, pubChemID, " + \
99
+ "MolarMass, MolarMassUnit) " + \
100
+ "VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
101
+ cursor.execute(insert_query, FuncEntDes_insert)
102
+ connection.commit()
103
+ # else:
104
+ # # Entry already exists, skip
105
+ # print("Entry already exists, skipping...")
106
+
107
+ # Commit the changes
108
+ connection.commit()
109
+
110
+ for function in Entry['Functions']:
111
+
112
+ # Write to NanoEntCom
113
+ FuncEntFunction_insert = (
114
+ ID,
115
+ Entry['Name'],
116
+ function['Type'],
117
+ function['FunctionDescription']
118
+ )
119
+
120
+ # Execute a SELECT statement to check
121
+ # if the entry already exists
122
+ search_query = "SELECT COUNT(*) " + \
123
+ "FROM FuncEntFunction WHERE " + \
124
+ "ID = ? AND FunctionEntity = ? " + \
125
+ "AND Function = ? " + \
126
+ "AND FunctionDescription = ?;"
127
+
128
+ cursor.execute(search_query, FuncEntFunction_insert)
129
+ count = cursor.fetchone()[0]
130
+
131
+ # Check the count to determine if the entry exists
132
+ if count == 0:
133
+ # Entry does not exist, proceed with insertion
134
+ insert_query = "INSERT INTO FuncEntFunction " + \
135
+ "(ID, FunctionEntity, " + \
136
+ "Function, " + \
137
+ "FunctionDescription) " + \
138
+ "VALUES (?, ?, ?, ?)"
139
+ cursor.execute(insert_query,
140
+ FuncEntFunction_insert)
141
+ connection.commit()
142
+ # else:
143
+ # # Entry already exists, skip
144
+ # print("Entry already exists, skipping...")
145
+ progress_bar.update(1)
146
+ cursor.close()
src/database_creation/general_info_creation.py ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sql_utils as su
2
+ from tqdm.notebook import tqdm_notebook
3
+ import datetime
4
+
5
+
6
+ def create_general_info_tables(connection):
7
+
8
+ general_info_string = "CREATE TABLE IF NOT EXISTS GeneralInfo " + \
9
+ "(ID INT, sampleName VARCHAR(150), " + \
10
+ "createdYear INT, createdMonth INT);"
11
+
12
+ keyword_info_string = "CREATE TABLE IF NOT EXISTS " + \
13
+ "SampleKeyWords (ID INT, sampleName VARCHAR(150), " + \
14
+ "SampleKeyWord VARCHAR(150));"
15
+
16
+ table_creation_querys = [
17
+ general_info_string,
18
+ keyword_info_string
19
+ ]
20
+
21
+ for query in table_creation_querys:
22
+ results = su.simple_querry(
23
+ connection,
24
+ query
25
+ )
26
+ if len(results) == 0:
27
+ print("Table Exists")
28
+ else:
29
+ print(results)
30
+ su.show_tables(connection)
31
+
32
+
33
+ def general_info_to_sql(
34
+ contact_dt,
35
+ connection
36
+ ):
37
+
38
+ total_ids = len(contact_dt)
39
+
40
+ with tqdm_notebook(
41
+ total=total_ids,
42
+ desc='Processing',
43
+ unit='ID'
44
+ ) as progress_bar:
45
+
46
+ # Create a cursor object
47
+ cursor = connection.cursor()
48
+
49
+ for ID in contact_dt:
50
+ sample_info = contact_dt[ID]
51
+
52
+ datetime_obj = datetime.datetime.fromtimestamp(
53
+ sample_info['createdDate'] / 1000
54
+ )
55
+
56
+ year = datetime_obj.year
57
+ month = datetime_obj.month
58
+
59
+ # Write to NanoEntCom
60
+ GeneralInfo_insert = (
61
+ ID,
62
+ sample_info['sampleName'],
63
+ year,
64
+ month
65
+ )
66
+
67
+ # Execute a SELECT statement to check
68
+ # if the entry already exists
69
+ search_query = "SELECT COUNT(*) " + \
70
+ "FROM GeneralInfo WHERE " + \
71
+ "ID = ? AND sampleName = ? " + \
72
+ "AND createdYear = ? AND createdMonth = ?;"
73
+
74
+ cursor.execute(search_query, GeneralInfo_insert)
75
+ count = cursor.fetchone()[0]
76
+
77
+ # Check the count to determine if the entry exists
78
+ if count == 0:
79
+ # Entry does not exist, proceed with insertion
80
+ insert_query = "INSERT INTO GeneralInfo (" + \
81
+ "ID, " + \
82
+ "sampleName, " + \
83
+ "createdYear, " + \
84
+ "createdMonth) " + \
85
+ "VALUES (?, ?, ?, ?)"
86
+
87
+ cursor.execute(
88
+ insert_query,
89
+ GeneralInfo_insert
90
+ )
91
+
92
+ connection.commit()
93
+
94
+ if sample_info['keywords'] and '<br />' in sample_info['keywords']:
95
+ keyword_list = sample_info['keywords'].split("<br />")
96
+ else:
97
+ if sample_info['keywords']:
98
+ keyword_list = [sample_info['keywords']]
99
+ else:
100
+ keyword_list = ["None"]
101
+
102
+ for keyword in keyword_list:
103
+ # Write to NanoEntCom
104
+ keyword_insert = (
105
+ ID,
106
+ sample_info['sampleName'],
107
+ keyword
108
+ )
109
+
110
+ # Execute a SELECT statement to check
111
+ # if the entry already exists
112
+ search_query = "SELECT COUNT(*) " + \
113
+ "FROM SampleKeyWords WHERE " + \
114
+ "ID = ? AND sampleName = ? " + \
115
+ "AND SampleKeyWord = ?;"
116
+
117
+ cursor.execute(search_query, keyword_insert)
118
+ count = cursor.fetchone()[0]
119
+
120
+ # Check the count to determine if the entry exists
121
+ if count == 0:
122
+ # Entry does not exist, proceed with insertion
123
+ insert_query = "INSERT INTO SampleKeyWords (" + \
124
+ "ID, " + \
125
+ "sampleName, " + \
126
+ "SampleKeyWord) " + \
127
+ "VALUES (?, ?, ?)"
128
+
129
+ cursor.execute(
130
+ insert_query,
131
+ keyword_insert
132
+ )
133
+
134
+ progress_bar.update(1)
135
+ cursor.close()
src/database_creation/nanomaterial_entity_creation.py ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import data_utils as du
2
+ import sql_utils as su
3
+ from tqdm.notebook import tqdm_notebook
4
+
5
+
6
+ def create_nanomaterial_entity_tables(connection):
7
+
8
+ NanoEntDes_str = "CREATE TABLE IF NOT EXISTS `NanoEntDes` " + \
9
+ "(`ID` INT, `NanoEntity` VARCHAR(255), `Description` TEXT);"
10
+
11
+ NanoEntCom_str = "CREATE TABLE IF NOT EXISTS NanoEntCom " + \
12
+ "(ID INT, NanoEntity VARCHAR(255), Composition VARCHAR(100), " + \
13
+ "CompositionType VARCHAR(100), MolecularWeight VARCHAR(150), " + \
14
+ "PubChemID VARCHAR(255));"
15
+
16
+ table_creation_querys = [
17
+ NanoEntDes_str,
18
+ NanoEntCom_str
19
+ ]
20
+
21
+ for query in table_creation_querys:
22
+ results = su.simple_querry(
23
+ connection,
24
+ query
25
+ )
26
+ if len(results) == 0:
27
+ print("Table Exists")
28
+ else:
29
+ print(results)
30
+ su.show_tables(connection)
31
+
32
+
33
+ def nanomaterialentity_to_sql(
34
+ composition_dt,
35
+ connection
36
+ ):
37
+
38
+ nanomaterialentity_dt = {}
39
+
40
+ du.parse_dictionary(
41
+ composition_dt,
42
+ "nanomaterialentity",
43
+ nanomaterialentity_dt
44
+ )
45
+
46
+ total_ids = len(nanomaterialentity_dt)
47
+
48
+ with tqdm_notebook(
49
+ total=total_ids,
50
+ desc='Processing',
51
+ unit='ID'
52
+ ) as progress_bar:
53
+
54
+ for ID in nanomaterialentity_dt:
55
+ sample_info = nanomaterialentity_dt[ID]
56
+ for NanoEntity in sample_info:
57
+ NanoEntity_info = sample_info[NanoEntity]
58
+ for Entry in NanoEntity_info:
59
+ # Create a cursor object
60
+ cursor = connection.cursor()
61
+
62
+ # Write to NanoEntDes
63
+ NanoEntDes_insert = (ID, NanoEntity, Entry['Description'])
64
+
65
+ # Execute a SELECT statement to check
66
+ # if the entry already exists
67
+
68
+ search_query = "SELECT COUNT(*) FROM NanoEntDes" + \
69
+ " WHERE ID = ? AND NanoEntity = ?" + \
70
+ " AND Description = ?"
71
+
72
+ cursor.execute(search_query, NanoEntDes_insert)
73
+ count = cursor.fetchone()[0]
74
+
75
+ # Check the count to determine if the entry exists
76
+ if count == 0:
77
+ # Entry does not exist, proceed with insertion
78
+ insert_query = "INSERT INTO NanoEntDes " + \
79
+ "(ID, NanoEntity, Description) " + \
80
+ "VALUES (?, ?, ?)"
81
+ cursor.execute(insert_query, NanoEntDes_insert)
82
+ connection.commit()
83
+ # else:
84
+ # # Entry already exists, skip
85
+ # print("Entry already exists, skipping...")
86
+
87
+ # Commit the changes
88
+ connection.commit()
89
+
90
+ for composition in Entry['ComposingElements']:
91
+ if 'DisplayName' in composition and not composition[
92
+ 'DisplayName'
93
+ ]:
94
+
95
+ composition_type = "NULL"
96
+ composition_name = "NULL"
97
+ composition_MolecularWeight = "NULL"
98
+
99
+ else:
100
+ # Extract composition_type
101
+ displayname = composition['DisplayName']
102
+ index_open = displayname.find("(")
103
+ composition_type = displayname[
104
+ :index_open
105
+ ].strip() if index_open != -1 else "NULL"
106
+
107
+ # Extract composition_name and composition_quantity
108
+ index_name = displayname.find(
109
+ "name: "
110
+ ) + len("name: ")
111
+ index_amount = displayname.find(", amount: ")
112
+
113
+ if index_name != -1:
114
+ if index_amount != -1:
115
+ composition_name = displayname[
116
+ index_name:index_amount
117
+ ].strip()
118
+ composition_MolecularWeight = displayname[
119
+ index_amount + len(", amount: "):-1
120
+ ].strip()
121
+ else:
122
+ composition_name = displayname[
123
+ index_name:-1
124
+ ].strip()
125
+ composition_MolecularWeight = "NULL"
126
+ else:
127
+ composition_name = "NULL"
128
+ composition_MolecularWeight = "NULL"
129
+
130
+ if 'PubChemId' in composition and not composition[
131
+ 'PubChemId'
132
+ ]:
133
+ PubChemID = "Null"
134
+ else:
135
+ PubChemID = composition['PubChemId']
136
+
137
+ # Write to NanoEntCom
138
+ NanoEntCom_insert = (
139
+ ID,
140
+ NanoEntity,
141
+ composition_name,
142
+ composition_type,
143
+ composition_MolecularWeight,
144
+ PubChemID
145
+ )
146
+
147
+ # Execute a SELECT statement to check
148
+ # if the entry already exists
149
+ search_query = "SELECT COUNT(*) " + \
150
+ "FROM NanoEntCom WHERE " + \
151
+ "ID = ? AND NanoEntity = ? " + \
152
+ "AND Composition = ? " + \
153
+ "AND CompositionType = ? " + \
154
+ "AND MolecularWeight = ? " + \
155
+ "AND PubChemID = ?"
156
+
157
+ cursor.execute(search_query, NanoEntCom_insert)
158
+ count = cursor.fetchone()[0]
159
+
160
+ # Check the count to determine if the entry exists
161
+ if count == 0:
162
+ # Entry does not exist, proceed with insertion
163
+ insert_query = "INSERT INTO NanoEntCom " + \
164
+ "(ID, NanoEntity, " + \
165
+ "Composition, " + \
166
+ "CompositionType, " + \
167
+ "MolecularWeight, PubChemID) " + \
168
+ "VALUES (?, ?, ?, ?, ?, ?)"
169
+ cursor.execute(insert_query, NanoEntCom_insert)
170
+ connection.commit()
171
+ # else:
172
+ # # Entry already exists, skip
173
+ # print("Entry already exists, skipping...")
174
+ progress_bar.update(1)
175
+ cursor.close()
src/database_creation/publication_info_creation.py ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sql_utils as su
2
+ from tqdm.notebook import tqdm_notebook
3
+ import re
4
+
5
+
6
+ def create_Publication_Info_tables(connection):
7
+
8
+ general_info_string = "CREATE TABLE IF NOT EXISTS PublicationInfo " + \
9
+ "(ID INT, PMID INT, year INT, " + \
10
+ "title TEXT, author TEXT, journal TEXT, " + \
11
+ "publicationCategories TEXT, description TEXT);"
12
+
13
+ keyword_info_string = "CREATE TABLE IF NOT EXISTS " + \
14
+ "PublicationKeyWords (ID INT, PublicationKeyWord VARCHAR(150));"
15
+
16
+ table_creation_querys = [
17
+ general_info_string,
18
+ keyword_info_string
19
+ ]
20
+
21
+ for query in table_creation_querys:
22
+ results = su.simple_querry(
23
+ connection,
24
+ query
25
+ )
26
+ if len(results) == 0:
27
+ print("Table Exists")
28
+ else:
29
+ print(results)
30
+
31
+ su.show_tables(connection)
32
+
33
+
34
+ def publication_info_to_sql(
35
+ publication_dt,
36
+ connection
37
+ ):
38
+
39
+ total_ids = len(publication_dt)
40
+
41
+ def extract_information(string):
42
+ pattern = r"(.*?)\.\s*(\d{4});.*?PMID:\s*(\d+)"
43
+ match = re.search(pattern, string)
44
+
45
+ if match:
46
+ publication = match.group(1).strip()
47
+ year = match.group(2).strip()
48
+ pmid = match.group(3)
49
+
50
+ pub_list = publication.split(".")
51
+ if len(pub_list) <= 2:
52
+ if "," not in pub_list[0]:
53
+ year_match = re.search(r'\b(\d{4})\b', string)
54
+ year = year_match.group(1)
55
+
56
+ # Extracting the journal name
57
+ journal_match = re.search(r'\. (.+?)\.', string)
58
+ journal = journal_match.group(1)
59
+
60
+ # Extracting the PMID
61
+ pmid_match = re.search(r'PMID: (\d+)', string)
62
+ pmid = pmid_match.group(1)
63
+
64
+ title_match = re.search(r'^([^\.]+)\.', string)
65
+ title = title_match.group(1).strip()
66
+
67
+ return {
68
+ 'journal': journal,
69
+ 'pmid': pmid,
70
+ 'year': year,
71
+ 'title': title,
72
+ 'author': "None"
73
+ }
74
+ else:
75
+ print(pub_list)
76
+ else:
77
+ return {
78
+ 'journal': pub_list[2].strip(),
79
+ 'pmid': pmid,
80
+ 'year': year,
81
+ 'title': pub_list[1].strip(),
82
+ 'author': pub_list[0].strip()
83
+ }
84
+
85
+ return {
86
+ 'journal': "None",
87
+ 'pmid': "None",
88
+ 'year': "None",
89
+ 'title': "None",
90
+ 'author': "None"
91
+ }
92
+
93
+ with tqdm_notebook(
94
+ total=total_ids,
95
+ desc='Processing',
96
+ unit='ID'
97
+ ) as progress_bar:
98
+
99
+ # Create a cursor object
100
+ cursor = connection.cursor()
101
+
102
+ for ID in publication_dt:
103
+ sample_info = publication_dt[ID]
104
+
105
+ for category in sample_info['category2Publications']:
106
+ info_dict = sample_info['category2Publications'][category]
107
+ for info in info_dict:
108
+
109
+ citation_string = info['displayName']
110
+
111
+ article_info = extract_information(citation_string)
112
+
113
+ # Write to NanoEntCom
114
+ publication_insert = (
115
+ ID,
116
+ article_info['pmid'],
117
+ article_info['year'],
118
+ article_info['title'],
119
+ article_info['author'],
120
+ article_info['journal'],
121
+ category,
122
+ info['description']
123
+ )
124
+
125
+ # Execute a SELECT statement to check
126
+ # if the entry already exists
127
+ search_query = "SELECT COUNT(*) " + \
128
+ "FROM PublicationInfo WHERE " + \
129
+ "ID = ? AND PMID = ? AND year = ? AND " + \
130
+ "title = ? AND author = ? AND journal = ? AND " + \
131
+ "publicationCategories = ? AND description = ?;"
132
+
133
+ cursor.execute(search_query, publication_insert)
134
+ count = cursor.fetchone()[0]
135
+
136
+ # Check the count to determine if the entry exists
137
+ if count == 0:
138
+ # Entry does not exist, proceed with insertion
139
+ insert_query = "INSERT INTO PublicationInfo (" + \
140
+ "ID, " + \
141
+ "PMID, " + \
142
+ "year, " + \
143
+ "title, " + \
144
+ "author, " + \
145
+ "journal, " + \
146
+ "publicationCategories, " + \
147
+ "description) " + \
148
+ "VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
149
+
150
+ cursor.execute(
151
+ insert_query,
152
+ publication_insert
153
+ )
154
+
155
+ connection.commit()
156
+
157
+ if info[
158
+ 'keywordsDisplayName'
159
+ ] and '<br />' in info[
160
+ 'keywordsDisplayName'
161
+ ]:
162
+ keyword_list = info[
163
+ 'keywordsDisplayName'
164
+ ].split("<br />")
165
+ else:
166
+ if info['keywordsDisplayName']:
167
+ keyword_list = [info['keywordsDisplayName']]
168
+ else:
169
+ keyword_list = ["None"]
170
+
171
+ for keyword in keyword_list:
172
+ # Write to NanoEntCom
173
+ keyword_insert = (
174
+ ID,
175
+ keyword
176
+ )
177
+
178
+ # Execute a SELECT statement to check
179
+ # if the entry already exists
180
+ search_query = "SELECT COUNT(*) " + \
181
+ "FROM PublicationKeyWords WHERE " + \
182
+ "ID = ? AND PublicationKeyWord = ?;"
183
+
184
+ cursor.execute(search_query, keyword_insert)
185
+ count = cursor.fetchone()[0]
186
+
187
+ # Check the count to determine if the entry exists
188
+ if count == 0:
189
+ # Entry does not exist, proceed with insertion
190
+ insert_query = "INSERT INTO " + \
191
+ "PublicationKeyWords (" + \
192
+ "ID, " + \
193
+ "PublicationKeyWord ) " + \
194
+ "VALUES (?, ?)"
195
+
196
+ cursor.execute(
197
+ insert_query,
198
+ keyword_insert
199
+ )
200
+
201
+ progress_bar.update(1)
202
+ cursor.close()
static/caNanoLablogo.jpg ADDED
templates/.ipynb_checkpoints/index-checkpoint.html ADDED
File without changes
templates/index.html ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <title>caNanoWiki AI</title>
5
+ <style>
6
+ body {
7
+ font-family: Arial, sans-serif;
8
+ background-color: #001f3f; /* Dark blue */
9
+ color: #fff;
10
+ margin: 0;
11
+ padding: 20px;
12
+ }
13
+
14
+ h1 {
15
+ font-size: 32px;
16
+ text-align: center;
17
+ margin-bottom: 40px;
18
+ }
19
+
20
+ .logo {
21
+ display: block;
22
+ margin: 0 auto;
23
+ margin-bottom: 40px;
24
+ width: 200px;
25
+ }
26
+
27
+ form {
28
+ text-align: center;
29
+ margin-bottom: 40px;
30
+ }
31
+
32
+ label {
33
+ display: block;
34
+ font-size: 20px;
35
+ margin-bottom: 10px;
36
+ }
37
+
38
+ input[type="text"] {
39
+ font-size: 18px;
40
+ padding: 10px;
41
+ width: 500px;
42
+ border-radius: 10px;
43
+ }
44
+
45
+ button {
46
+ font-size: 18px;
47
+ padding: 10px 20px;
48
+ background-color: #00bfff;
49
+ color: #fff;
50
+ border: none;
51
+ border-radius: 10px;
52
+ cursor: pointer;
53
+ margin-top: 10px;
54
+ }
55
+
56
+ button:hover {
57
+ background-color: #0088cc;
58
+ }
59
+
60
+ .result {
61
+ background-color: #fff;
62
+ color: #000;
63
+ padding: 20px;
64
+ border-radius: 10px;
65
+ text-align: center;
66
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.2);
67
+ margin: 0 auto;
68
+ max-width: 800px;
69
+ overflow-y: auto; /* Enable vertical scrollbar */
70
+ max-height: 480px; /* Set maximum height for the result window */
71
+ }
72
+
73
+ .export-button {
74
+ font-size: 18px;
75
+ padding: 10px 20px;
76
+ background-color: #008000; /* Green */
77
+ color: #fff;
78
+ border: none;
79
+ border-radius: 10px;
80
+ cursor: pointer;
81
+ margin-top: 10px;
82
+ }
83
+
84
+ .export-button:hover {
85
+ background-color: #006400; /* Dark green */
86
+ }
87
+
88
+ .loading {
89
+ font-size: 18px;
90
+ text-align: center;
91
+ margin-bottom: 20px;
92
+ }
93
+
94
+ .another-box.folded {
95
+ display: flex;
96
+ justify-content: center;
97
+ align-items: center;
98
+ background-color: #999;
99
+ color: #fff;
100
+ border-radius: 10px;
101
+ padding: 20px;
102
+ cursor: pointer;
103
+ /* Additional styling for the folded state */
104
+ }
105
+
106
+ .another-box {
107
+ background-color: #fff;
108
+ color: #000;
109
+ border-radius: 10px;
110
+ padding: 20px;
111
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.2);
112
+ margin-top: 20px;
113
+ /* Additional styling for the unfolded state */
114
+ }
115
+
116
+ .dataframe {
117
+ border-collapse: collapse;
118
+ width: 100%;
119
+ }
120
+
121
+ .dataframe th,
122
+ .dataframe td {
123
+ border: 1px solid #ddd;
124
+ padding: 8px;
125
+ text-align: left;
126
+ }
127
+
128
+ .dataframe th {
129
+ background-color: #f2f2f2;
130
+ font-weight: bold;
131
+ }
132
+
133
+ .form-buttons {
134
+ display: flex;
135
+ justify-content: space-between;
136
+ align-items: center;
137
+ }
138
+
139
+ .form-buttons button {
140
+ margin: 10px;
141
+ }
142
+ </style>
143
+ </head>
144
+ <body>
145
+ <h1>Welcome to caNanoLibrarian!</h1>
146
+
147
+ <img src="{{ url_for('static', filename='caNanoLablogo.jpg') }}" alt="caNanoWiki AI Logo" class="logo">
148
+
149
+
150
+ <form method="POST" action="/">
151
+ <label for="user_input">How can I help?</label>
152
+ <input type="text" id="user_input" name="user_input" value="{{ user_input }}" autofocus>
153
+ <br>
154
+ <button type="submit">Search</button>
155
+ <button onclick="openPopup()">Database Structure</button>
156
+ </form>
157
+
158
+ {% if processing %}
159
+ <p class="loading">Working on it...</p>
160
+ {% endif %}
161
+
162
+ {% if processed_input %}
163
+ <div class="result">
164
+ <p>{{ processed_input | safe }}</p>
165
+ </div>
166
+ <button class="export-button" onclick="exportToCSV()">Export as CSV</button>
167
+ <div class="Source-Info" onclick="toggleFoldedState(this)">
168
+ <h2>SQL Query String</h2>
169
+ <p>{{ source_sections | nl2br | safe }}</p>
170
+ </div>
171
+ {% else %}
172
+ <div class="result" style="display: none;"></div>
173
+ {% endif %}
174
+
175
+ <script>
176
+ function toggleFoldedState(element) {
177
+ element.classList.toggle('folded');
178
+ }
179
+
180
+ function exportToCSV() {
181
+ // Get the HTML table element containing the DataFrame
182
+ var table = document.querySelector('.result table');
183
+
184
+ // Create an empty string to store the CSV content
185
+ var csvContent = "";
186
+
187
+ // Iterate through each row in the table
188
+ for (var i = 0; i < table.rows.length; i++) {
189
+ var row = table.rows[i];
190
+
191
+ // Iterate through each cell in the row
192
+ for (var j = 0; j < row.cells.length; j++) {
193
+ var cell = row.cells[j];
194
+
195
+ // Extract the cell value and add it to the CSV content
196
+ var cellValue = cell.innerText.trim();
197
+ csvContent += '"' + cellValue + '",';
198
+ }
199
+
200
+ // Add a line break after each row
201
+ csvContent += '\n';
202
+ }
203
+
204
+ // Create a temporary link element
205
+ var link = document.createElement('a');
206
+ link.setAttribute('href', 'data:text/csv;charset=utf-8,' + encodeURIComponent(csvContent));
207
+ link.setAttribute('download', 'result.csv');
208
+ link.style.display = 'none';
209
+
210
+ // Add the link to the document and simulate a click event to trigger the download
211
+ document.body.appendChild(link);
212
+ link.click();
213
+ document.body.removeChild(link);
214
+ }
215
+
216
+ function openPopup() {
217
+ var popupWindow = window.open("", "Popup Window", "width=400,height=500");
218
+ var content = `
219
+ <html>
220
+ <head>
221
+ <title>Pop-up Window</title>
222
+ </head>
223
+ <body>
224
+ <h2>Database Structure</h2>
225
+ <p>{table:NanoEntDes,attributes:[ID(Primary),NanoEntity,Description]}</p>
226
+ <p>{table:NanoEntCom,attributes:[ID(Primary),NanoEntity,Composition,CompositionType,MolecularWeight,PubChemID]}</p>
227
+ <p>{table:FuncEntDes,attributes:[ID(Primary),FunctionEntity,FunctionEntityType,Description,ActivationMethod,pubChemID,MolarMass,MolarMassUnit]}</p>
228
+ <p>{table:FuncEntFunction,attributes:[ID(Primary),FunctionEntiry,Function,FunctionDescription]}</p>
229
+ <p>{table:ChemAsso,attributes:[ID(Primary),AssociationType,BondType,Description,dataId,ComposingElementNameA,<br>
230
+ ComposingElementNameB,CompositiontypeB,CompositiontypeA,DomainElementNameB,DomainElementNameA,DomainAssociationId,<br>
231
+ ComposingElemetIdB,ComposingElemetIdA,ComposingElementTypeA,EntityDisplayNameB,ComposingElementTypeB,EntityDisplayNameA,AttachmentId]}</p>
232
+ <p>{table:GeneralInfo,attributes:[ID(Primary),sampleName,createdYear,createdMonth]}</p>
233
+ <p>{table:SampleKeyWords,attributes:[ID(Primary),sampleName,SampleKeyWord]}</p>
234
+ <p>{table:PublicationInfo,attributes:[ID(Primary),PMID,year,title,author,journal,publicationCategories,description]}</p>
235
+ <p>{table:PublicationKeyWords,attributes:[ID(Primary),sampleName,SampleKeyWord]}</p>
236
+ <p>{table:CharacterizationInfo,attributes:[ID(Primary),CharType,CharName,AssayType,Protocol,DesignDescription,AnalysisAndConclusion]}</p>
237
+ <p>{table:CharExpConfig,attributes:[ID(Primary),CharType,CharName,AssayType,ExpConfigTechnique,ExpConfigInstruments,ExpConfigDescription]}</p>
238
+ <p>{table:CharResultDescriptions,attributes:[ID(Primary),CharType,CharName,AssayType,CharResultDescription]}</p>
239
+ <p>{table:CharResultKeywords,attributes:[ID(Primary),CharType,CharName,AssayType,CharResultKeyword]}</p>
240
+ <p>{table:CharResultTables,attributes:[ID(Primary),CharType,CharName,AssayType,CharTable]}</p>
241
+ </body>
242
+ </html>
243
+ `;
244
+
245
+ var popupWindow = window.open("", "Popup Window", "width=600,height=400");
246
+ popupWindow.document.write(content);
247
+ popupWindow.document.close();
248
+ }
249
+ </script>
250
+ </body>
251
+ </html>
252
+
templates/login.html ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <title>Login</title>
5
+ <style>
6
+ body {
7
+ background-color: #11182B;
8
+ font-family: Arial, sans-serif;
9
+ color: #FFFFFF;
10
+ display: flex;
11
+ align-items: center;
12
+ justify-content: center;
13
+ height: 100vh;
14
+ }
15
+
16
+ .login-container {
17
+ width: 350px;
18
+ padding: 20px;
19
+ background-color: #223E6D;
20
+ border-radius: 10px;
21
+ }
22
+
23
+ .login-container label, .login-container input {
24
+ display: block;
25
+ width: 90%;
26
+ margin-bottom: 10px;
27
+ }
28
+
29
+ .login-container input {
30
+ padding: 10px;
31
+ border-radius: 5px;
32
+ }
33
+
34
+ .login-container button {
35
+ display: block;
36
+ width: 100%;
37
+ padding: 10px;
38
+ border: none;
39
+ border-radius: 5px;
40
+ background-color: #2D82B7;
41
+ color: #FFFFFF;
42
+ cursor: pointer;
43
+ }
44
+
45
+ .login-container button:hover {
46
+ background-color: #1B577B;
47
+ }
48
+ </style>
49
+ </head>
50
+ <body>
51
+ <div class="login-container">
52
+ <h1>Login</h1>
53
+ <form method="POST" action="/login">
54
+ <label for="passcode">Please enter the passcode:</label>
55
+ <input type="password" id="passcode" name="passcode">
56
+ <button type="submit">Submit</button>
57
+ </form>
58
+ </div>
59
+ </body>
60
+ </html>
templates/result.html ADDED
File without changes