aya-se commited on
Commit
8499e52
·
1 Parent(s): 1414bb0

Added models and licenses

Browse files
APACHE_LICENSE_VERSION_2.0.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
GEMMA_TERMS_OF_USE.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Gemma Terms of Use
2
+
3
+ Last modified: April 1, 2024
4
+
5
+ By using, reproducing, modifying, distributing, performing or displaying any portion or element of Gemma, Model Derivatives including via any Hosted Service, (each as defined below) (collectively, the "Gemma Services") or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.
6
+
7
+ Section 1: DEFINITIONS
8
+ 1.1 Definitions
9
+ (a) "Agreement" or "Gemma Terms of Use" means these terms and conditions that govern the use, reproduction, Distribution or modification of the Gemma Services and any terms and conditions incorporated by reference.
10
+
11
+ (b) "Distribution" or "Distribute" means any transmission, publication, or other sharing of Gemma or Model Derivatives to a third party, including by providing or making Gemma or its functionality available as a hosted service via API, web access, or any other electronic or remote means ("Hosted Service").
12
+
13
+ (c) "Gemma" means the set of machine learning language models, trained model weights and parameters identified at ai.google.dev/gemma, regardless of the source that you obtained it from.
14
+
15
+ (d) "Google" means Google LLC.
16
+
17
+ (e) "Model Derivatives" means all (i) modifications to Gemma, (ii) works based on Gemma, or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Gemma, to that model in order to cause that model to perform similarly to Gemma, including distillation methods that use intermediate data representations or methods based on the generation of synthetic data Outputs by Gemma for training that model. For clarity, Outputs are not deemed Model Derivatives.
18
+
19
+ (f) "Output" means the information content output of Gemma or a Model Derivative that results from operating or otherwise using Gemma or the Model Derivative, including via a Hosted Service.
20
+
21
+ 1.2
22
+ As used in this Agreement, "including" means "including without limitation".
23
+
24
+ Section 2: ELIGIBILITY AND USAGE
25
+ 2.1 Eligibility
26
+ You represent and warrant that you have the legal capacity to enter into this Agreement (including being of sufficient age of consent). If you are accessing or using any of the Gemma Services for or on behalf of a legal entity, (a) you are entering into this Agreement on behalf of yourself and that legal entity, (b) you represent and warrant that you have the authority to act on behalf of and bind that entity to this Agreement and (c) references to "you" or "your" in the remainder of this Agreement refers to both you (as an individual) and that entity.
27
+
28
+ 2.2 Use
29
+ You may use, reproduce, modify, Distribute, perform or display any of the Gemma Services only in accordance with the terms of this Agreement, and must not violate (or encourage or permit anyone else to violate) any term of this Agreement.
30
+
31
+ Section 3: DISTRIBUTION AND RESTRICTIONS
32
+ 3.1 Distribution and Redistribution
33
+ You may reproduce or Distribute copies of Gemma or Model Derivatives if you meet all of the following conditions:
34
+
35
+ 1. You must include the use restrictions referenced in Section 3.2 as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Gemma or Model Derivatives and you must provide notice to subsequent users you Distribute to that Gemma or Model Derivatives are subject to the use restrictions in Section 3.2.
36
+ 2. You must provide all third party recipients of Gemma or Model Derivatives a copy of this Agreement.
37
+ 3. You must cause any modified files to carry prominent notices stating that you modified the files.
38
+ 4. All Distributions (other than through a Hosted Service) must be accompanied by a "Notice" text file that contains the following notice: "Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms".
39
+ You may add your own intellectual property statement to your modifications and, except as set forth in this Section, may provide additional or different terms and conditions for use, reproduction, or Distribution of your modifications, or for any such Model Derivatives as a whole, provided your use, reproduction, modification, Distribution, performance, and display of Gemma otherwise complies with the terms and conditions of this Agreement. Any additional or different terms and conditions you impose must not conflict with the terms of this Agreement.
40
+
41
+ 3.2 Use Restrictions
42
+ You must not use any of the Gemma Services:
43
+
44
+ 1. for the restricted uses set forth in the Gemma Prohibited Use Policy at ai.google.dev/gemma/prohibited_use_policy ("Prohibited Use Policy"), which is hereby incorporated by reference into this Agreement; or
45
+ 2. in violation of applicable laws and regulations.
46
+ To the maximum extent permitted by law, Google reserves the right to restrict (remotely or otherwise) usage of any of the Gemma Services that Google reasonably believes are in violation of this Agreement.
47
+
48
+ 3.3 Generated Output
49
+ Google claims no rights in Outputs you generate using Gemma. You and your users are solely responsible for Outputs and their subsequent uses.
50
+
51
+ Section 4: ADDITIONAL PROVISIONS
52
+ 4.1 Updates
53
+ Google may update Gemma from time to time.
54
+
55
+ 4.2 Trademarks
56
+ Nothing in this Agreement grants you any rights to use Google's trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between you and Google. Google reserves any rights not expressly granted herein.
57
+
58
+ 4.3 DISCLAIMER OF WARRANTY
59
+ UNLESS REQUIRED BY APPLICABLE LAW, THE GEMMA SERVICES, AND OUTPUTS, ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE GEMMA SERVICES OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR USE OR DISTRIBUTION OF ANY OF THE GEMMA SERVICES OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
60
+
61
+ 4.4 LIMITATION OF LIABILITY
62
+ TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY, CONTRACT, OR OTHERWISE, UNLESS REQUIRED BY APPLICABLE LAW, SHALL GOOGLE OR ITS AFFILIATES BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL, OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO, ANY OF THE GEMMA SERVICES OR OUTPUTS EVEN IF GOOGLE OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
63
+
64
+ 4.5 Term, Termination, and Survival
65
+ The term of this Agreement will commence upon your acceptance of this Agreement (including acceptance by your use, modification, or Distribution, reproduction, performance or display of any portion or element of the Gemma Services) and will continue in full force and effect until terminated in accordance with the terms of this Agreement. Google may terminate this Agreement if you are in breach of any term of this Agreement. Upon termination of this Agreement, you must delete and cease use and Distribution of all copies of Gemma and Model Derivatives in your possession or control. Sections 1, 2.1, 3.3, 4.2 to 4.9 shall survive the termination of this Agreement.
66
+
67
+ 4.6 Governing Law and Jurisdiction
68
+ This Agreement will be governed by the laws of the State of California without regard to choice of law principles. The UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The state and federal courts of Santa Clara County, California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
69
+
70
+ 4.7 Severability
71
+ If any provision of this Agreement is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
72
+
73
+ 4.8 Entire Agreement
74
+ This Agreement states all the terms agreed between the parties and supersedes all other agreements between the parties as of the date of acceptance relating to its subject matter.
75
+
76
+ 4.9 No Waiver
77
+ Google will not be treated as having waived any rights by not exercising (or delaying the exercise of) any rights under this Agreement.
LLAMA_3.1_ACCEPTABLE_USE_POLICY.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **Llama 3.1** **Acceptable Use Policy**
2
+
3
+ Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.1. If you access or use Llama 3.1, you agree to this Acceptable Use Policy (“**Policy**”). The most recent copy of this policy can be found at <span style="text-decoration:underline;">https://llama.meta.com/llama3_1/use-policy</span>.
4
+
5
+ **Prohibited Uses**
6
+
7
+ We want everyone to use Llama 3.1 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.1 to:
8
+
9
+ - Violate the law or others’ rights, including to:
10
+ - Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
11
+ - Violence or terrorism
12
+ - Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
13
+ - Human trafficking, exploitation, and sexual violence
14
+ - The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
15
+ - Sexual solicitation
16
+ - Any other criminal activity
17
+ - Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
18
+ - Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
19
+ - Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
20
+ - Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
21
+ - Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials
22
+ - Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
23
+ - Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.1 related to the following:
24
+ - Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
25
+ - Guns and illegal weapons (including weapon development)
26
+ - Illegal drugs and regulated/controlled substances
27
+ - Operation of critical infrastructure, transportation technologies, or heavy machinery
28
+ - Self-harm or harm to others, including suicide, cutting, and eating disorders
29
+ - Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
30
+ - Intentionally deceive or mislead others, including use of Llama 3.1 related to the following:
31
+ - Generating, promoting, or furthering fraud or the creation or promotion of disinformation
32
+ - Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
33
+ - Generating, promoting, or further distributing spam
34
+ - Impersonating another individual without consent, authorization, or legal right
35
+ - Representing that the use of Llama 3.1 or outputs are human-generated
36
+ - Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
37
+ - Fail to appropriately disclose to end users any known dangers of your AI system
38
+
39
+ Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
40
+
41
+ - Reporting issues with the model: <span style="text-decoration:underline;">https://github.com/meta-llama/llama-models/issues</span>
42
+ - Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
43
+ - Reporting bugs and security concerns: facebook.com/whitehat/info
44
+ - Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.1: [email protected]
LLAMA_3.1_COMMUNITY_LICENSE_AGREEMENT.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
2
+
3
+ Llama 3.1 Version Release Date: July 23, 2024
4
+
5
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.1 distributed by Meta at https://llama.meta.com/doc/overview.
8
+
9
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
10
+
11
+ “Llama 3.1” means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at https://llama.meta.com/llama-downloads.
12
+
13
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any portion thereof) made available under this Agreement.
14
+
15
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
16
+
17
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials, you agree to be bound by this Agreement.
18
+
19
+ 1. License Rights and Redistribution.

20
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.
21
+
22
+ b. Redistribution and Use.
23
+
24
+ i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name.
25
+
26
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you. 
27
+
28
+ iii. You must retain in all copies of the Llama Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.”
29
+
30
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by reference into this Agreement.
31
+
32
+ 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
33
+
34
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
35
+
36
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
37
+
38
+ 5. Intellectual Property.
39
+
40
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible at https://about.meta.com/brand/resources/meta/company-brand/). All goodwill arising out of your use of the Mark will inure to the benefit of Meta.
41
+
42
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Llama Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.
43
+
44
+ c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Llama Materials.
45
+
46
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
47
+
48
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
README.md CHANGED
@@ -1,11 +1,56 @@
1
  ---
2
  license: other
3
  license_name: mixed
4
- license_link: https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/README.md#license-information
5
  language: ja
6
  pipeline_tag: text-classification
7
  library_name: fasttext
8
  ---
9
 
10
- # 教育的ウェブテキストのfastText分類器
11
- WIP
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  license_name: mixed
4
+ license_link: https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/README.md
5
  language: ja
6
  pipeline_tag: text-classification
7
  library_name: fasttext
8
  ---
9
 
10
+ # Swallow Edu Classifier
11
+
12
+ [日本語版の README はこちら](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/README_ja.md)
13
+
14
+ ## Model summary
15
+
16
+ **NOTE**: This classifier is designed to work only with **Japanese** text. Quality for English or other languages is not guaranteed.
17
+
18
+ This repository contains fastText classifiers for judging the educational value of Japanese web pages. It includes two types of classifiers:
19
+
20
+ 1. **Wiki-based classifier**: trained on Japanese Wikipedia text from academic categories. This classifier is released under the [Apache-2.0 License](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/APACHE_LICENSE_VERSION_2.0.md).
21
+ 2. **LLM-based classifier**: trained on annotations provided by an LLM, governed by the license applicable to the underlying LLM used for annotation ([Llama 3.1 Community License Agreement](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/LLAMA_3) or [Gemma Terms of Use](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/GEMMA_TERMS_OF_USE.md)).
22
+
23
+ These classifiers were developed as part of a quality-filtering process for Swallow Corpus Version 2, used in the training of the [Llama 3.1 Swallow](https://huggingface.co/collections/tokyotech-llm/llama-31-swallow-66fd4f7da32705cadd1d5bc6) series. Our ablation experiments have shown that applying a filter based on the classifier’s scores improved the LLM’s ability related to Japanese knowledge.
24
+
25
+ ### How to use
26
+
27
+ The Wiki-based classifier outputs a probability between 0 and 1, indicating how likely a given document resembles Wikipedia content. On the other hand, the LLM-based classifier predicts which of the four labels (0, 1, 2, or 3) the educational score of a given document belongs to, treating it as a four-class classification problem. The expected value of the score (ranging from 0 to 3), based on the predicted probabilities for each label, can be considered the final score.
28
+
29
+ ```python
30
+ from huggingface_hub import hf_hub_download
31
+ import fasttext
32
+
33
+ model = fasttext.load_model(hf_hub_download("tokyotech-llm/edu-classifier", "model.bin"))
34
+ ```
35
+
36
+ ### Best practice
37
+
38
+ If you aim to assign appropriate ranked scores to a wide range of documents, it is recommended to use the LLM-based classifier. The Wiki-based classifier tends to assign scores close to 0 for most documents, making it specialized for detecting the few documents that resemble Wikipedia. In contrast, the LLM-based classifier can provide grading based on a broader definition of educational value.
39
+
40
+ ## Training
41
+
42
+ ### Wiki-based classifier
43
+
44
+ ### LLM-based classifier
45
+
46
+ ## How to cite
47
+
48
+ ```bibtex
49
+ @inproceedings{hattori-2025-swallow-v2,
50
+ author = {服部 翔 and 岡崎 直観 and 水木 栄 and 藤井 一喜 and 中村 泰士 and 大井 聖也 and 塩谷 泰平 and 齋藤 幸史郎 and Youmi Ma and 前田 航希 and 岡本 拓己 and 石田 茂樹 and 横田 理央 and 高村 大也},
51
+ title = {Swallowコーパスv2: 教育的な日本語ウェブコーパスの構築},
52
+ booktitle = {言語処理学会第31回年次大会 (NLP2025)},
53
+ comment = mar,
54
+ year = {2025},
55
+ }
56
+ ```
README_ja.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Swallow Edu Classifier
2
+
3
+ ## モデルの概要
4
+
5
+ **注意**:日本語でのみ動作します。英語やそれ以外の言語での品質は保証しません。
6
+
7
+ 日本語ウェブページの教育的価値を判定する fastText 分類器です。本リポジトリには学術カテゴリに属する日本語 Wikipedia テキストを元に訓練された分類器(Wiki-based classifier)と、LLM によるアノテーションを元に訓練された分類器(LLM-based classifier)が含まれます。前者には [Apache-2.0 ライセンス](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/APACHE_LICENSE_VERSION_2.0.md)、後者にはアノテーションに使用された LLM に応じたライセンス([Llama 3.1 Community License Agreement](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/LLAMA_3.1_COMMUNITY_LICENSE_AGREEMENT.md)、[Gemma Terms of Use](https://huggingface.co/datasets/tokyotech-llm/edu-classifier/blob/main/GEMMA_TERMS_OF_USE.md))が適用されます。
8
+
9
+ これらの分類器は[Llama 3.1 Swallow](https://huggingface.co/collections/tokyotech-llm/llama-31-swallow-66fd4f7da32705cadd1d5bc6)シリーズの訓練に用いられた Swallow Corpus Version 2 の品質フィルタリングの一環として開発されました。Ablation 実験では、分類器のスコアに基づくフィルタリングの適用により、LLM の日本語知識が向上することを確認しました。
10
+
11
+ ### 使用法
12
+
13
+ Wiki-based classifier は与えられた文書が Wikipedia らしいかどうかを 0〜1 の確率で出力します。一方、LLM-based classifier は与えられた文書の教育的スコアが 0、1、2、3 のいずれに属するかどうかを 4 ラベル分類問題として予測します。各ラベルの予測確率に基づくスコアの期待値(0〜3)を最終的なスコアとして用いることができます。
14
+
15
+ ### ベストプラクティス
16
+
17
+ 広範な文書に適切な序列のスコアを付与したい場合、LLM-based classifier の使用を推奨します。Wiki-based classifier はほとんどの文書に 0 付近のスコアを付与する傾向にあるため、Wikipedia らしいわずかな文書の検出に特化しています。一方、LLM-based classifier はより一般的な教育的価値の定義に基づいた採点をすることができます。
llm_gemma.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fda965e6a4e8057d9d37d7574a7a212533fcf8b364499187b1f80c8e36e2689
3
+ size 5533888560
llm_llama.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecc855b99598240e612e5d222cf0ffbb5963987e29ee6a077750812ab41963bf
3
+ size 5557759240
utils/prompt.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Below is an extract from a web page. As an experienced teacher with a focus on higher education, evaluate the educational value of the given text using the additive 3-point scoring system described below.
2
+
3
+ ### Evaluation Criteria
4
+
5
+ 1. **Highly Educational Topic** (1 point):
6
+ - The extract provides objective facts or knowledge that are important for university students to acquire a broad education and has high educational value. It helps build a crucial foundation for academic learning and social life and has broad applicability. For example, it includes knowledge related to business, accounting, philosophy, everyday trivia, science, social sciences, humanities, law, technology, health, etc.
7
+ 2. **Provides Deep Insights or Discussions** (1 point):
8
+ - The extract consistently offers detailed information and explanations on educational topics. It goes beyond merely handling words or concepts superficially, providing deep insights or discussions, and thus has high educational value.
9
+ 3. **Clear Explanation for General Audience** (1 point):
10
+ - The extract provides clear and simple explanations on educational topics, making it easy for the general public, who are not experts in the field, to understand the content well.
11
+
12
+ ### Evaluation Method
13
+
14
+ 1. Evaluate the text on a 3-point scale based on the above criteria.
15
+ 2. Add 1 point for each criterion met (a maximum of 3 points if all criteria are met).
16
+
17
+ ### Output Format
18
+
19
+ 1. First, briefly explain the evaluation results (0 points or 1 point) for each of the three criteria and the reasons for each.
20
+ 2. Finally, state the total score in the format "Educational Score: <total points>".
21
+
22
+ ### Extract
23
+
24
+ {TEXT}
25
+
26
+ ### Output
wiki.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e493dedcb5da2a04eadb5cda9d90161072f71df8f4615acb0088e59282f5d01
3
+ size 2351720465