|
--- |
|
base_model: Alibaba-NLP/gte-large-en-v1.5 |
|
library_name: sentence-transformers |
|
metrics: |
|
- cosine_accuracy@1 |
|
- cosine_accuracy@3 |
|
- cosine_accuracy@5 |
|
- cosine_accuracy@10 |
|
- cosine_precision@1 |
|
- cosine_precision@3 |
|
- cosine_precision@5 |
|
- cosine_precision@10 |
|
- cosine_recall@1 |
|
- cosine_recall@3 |
|
- cosine_recall@5 |
|
- cosine_recall@10 |
|
- cosine_ndcg@10 |
|
- cosine_mrr@10 |
|
- cosine_map@100 |
|
- dot_accuracy@1 |
|
- dot_accuracy@3 |
|
- dot_accuracy@5 |
|
- dot_accuracy@10 |
|
- dot_precision@1 |
|
- dot_precision@3 |
|
- dot_precision@5 |
|
- dot_precision@10 |
|
- dot_recall@1 |
|
- dot_recall@3 |
|
- dot_recall@5 |
|
- dot_recall@10 |
|
- dot_ndcg@10 |
|
- dot_mrr@10 |
|
- dot_map@100 |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:586 |
|
- loss:MultipleNegativesRankingLoss |
|
widget: |
|
- source_sentence: Explain the spectrum of openness in AI systems as described in |
|
the document. How do open-source AI systems differ from fully closed AI systems |
|
in terms of accessibility and innovation? |
|
sentences: |
|
- 'targets of cyber attacks; or |
|
|
|
(iii) permitting the evasion of human control or oversight through |
|
|
|
means of deception or obfuscation. |
|
|
|
Models meet this definition even if they are provided to end users with |
|
|
|
technical safeguards that attempt to prevent users from taking advantage of |
|
|
|
the relevant unsafe capabilities. |
|
|
|
(l) The term “Federal law enforcement agency” has the meaning set forth |
|
|
|
in section 21(a) of Executive Order 14074 of May 25, 2022 (Advancing |
|
|
|
Effective, Accountable Policing and Criminal Justice Practices To Enhance |
|
|
|
Public Trust and Public Safety). |
|
|
|
(m) The term “floating-point operation” means any mathematical |
|
|
|
operation or assignment involving floating-point numbers, which are a |
|
|
|
subset of the real numbers typically represented on computers by an integer |
|
|
|
of fixed precision scaled by an integer exponent of a fixed base. |
|
|
|
(n) The term “foreign person” has the meaning set forth in section 5(c) |
|
of |
|
|
|
Executive Order 13984 of January 19, 2021 (Taking Additional Steps To |
|
|
|
Address the National Emergency With Respect to Significant Malicious |
|
|
|
Cyber-Enabled Activities). |
|
|
|
(o) The terms “foreign reseller” and “foreign reseller of United States |
|
|
|
Infrastructure as a Service Products” mean a foreign person who has |
|
|
|
established an Infrastructure as a Service Account to provide Infrastructure |
|
|
|
as a Service Products subsequently, in whole or in part, to a third party. |
|
|
|
(p) The term “generative AI” means the class of AI models that emulate |
|
|
|
the structure and characteristics of input data in order to generate derived |
|
|
|
synthetic content. This can include images, videos, audio, text, and other |
|
|
|
digital content. |
|
|
|
(q) The terms “Infrastructure as a Service Product,” “United States |
|
|
|
Infrastructure as a Service Product,” “United States Infrastructure as a |
|
|
|
Service Provider,” and “Infrastructure as a Service Account” each have the |
|
|
|
respective meanings given to those terms in section 5 of Executive Order |
|
|
|
13984. |
|
|
|
(r) The term “integer operation” means any mathematical operation or |
|
|
|
assignment involving only integers, or whole numbers expressed without a |
|
|
|
decimal point.05/10/2024, 16:36 Executive Order on the Safe, Secure, and Trustworthy |
|
Development and Use of Artificial Intelligence | The White House |
|
|
|
https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artific… |
|
7/59' |
|
- "AI safety, enable next-generation medical diagnoses and further other\ncritical\ |
|
\ AI priorities.\n\0\0 Released a for designing safe, secure, and trustworthy\ |
|
\ AI tools\nfor use in education. The Department of Education’s guide discusses\n\ |
|
how developers of educational technologies can design AI that benefits\nstudents\ |
|
\ and teachers while advancing equity, civil rights, trust, and\ntransparency.\ |
|
\ This work builds on the Department’s 2023 \noutlining recommendations for the\ |
|
\ use of AI in teaching and learning.\n\0\0 Published guidance on evaluating the\ |
|
\ eligibility of patent claims\ninvolving inventions related to AI technology, as\ |
|
\ well as other\nemerging technologies. The guidance by the U.S. Patent and Trademark\n\ |
|
Office will guide those inventing in the AI space to protect their AI\ninventions\ |
|
\ and assist patent examiners reviewing applications for\npatents on AI inventions.\n\ |
|
\0\0 Issued a on federal research and development (R&D) to\nadvance trustworthy\ |
|
\ AI over the past four years. The report by the\nNational Science and Technology\ |
|
\ Council examines an annual federal AI\nR&D budget of nearly $3 billion.\n\0\0\ |
|
\ Launched a $23 million initiative to promote the use of privacy-\nenhancing\ |
|
\ technologies to solve real-world problems, including\nrelated to AI. Working\ |
|
\ with industry and agency partners, NSF will\ninvest through its new Privacy-preserving\ |
|
\ Data Sharing in Practice\nprogram in efforts to apply, mature, and scale privacy-enhancing\n\ |
|
technologies for specific use cases and establish testbeds to accelerate\ntheir\ |
|
\ adoption.\n\0\0 Announced millions of dollars in further investments to advance\n\ |
|
responsible AI development and use throughout our society. These\ninclude $30\ |
|
\ million invested through NSF’s Experiential Learning in\nEmerging and Novel\ |
|
\ Technologies program—which supports inclusive\nexperiential learning in fields\ |
|
\ like AI—and $10 million through NSF’s\nExpandAI program, which helps build capacity\ |
|
\ in AI research at\nminority-serving institutions while fostering the development\ |
|
\ of a\ndiverse, AI-ready workforce.\nAdvancing U.S. Leadership Abroad\nPresident\ |
|
\ Biden’s Executive Order emphasized that the United States lead\nglobal efforts\ |
|
\ to unlock AI’s potential and meet its challenges. To advance\nU.S. leadership\ |
|
\ on AI, agencies have:guide\nreport\nreport05/10/2024, 16:35 FACT SHEET: Biden-Harris\ |
|
\ Administration Announces New AI Actions and Receives Additional Major Voluntary\ |
|
\ Commitment on AI | The…\nhttps://www.whitehouse.gov/briefing-room/statements-releases/2024/07/26/fact-sheet-biden-harris-administration-announces-new-ai-actions-and-receives-addit…\ |
|
\ 4/10" |
|
- "50 Governing AI for Humanity processes such as the recent scientific report\ |
|
\ \non the risks of advanced AI commissioned by \nthe United Kingdom,25 and relevant\ |
|
\ regional \norganizations.\ne. A steering committee would develop a research\ |
|
\ \nagenda ensuring the inclusivity of views and \nincorporation of ethical considerations,\ |
|
\ oversee \nthe allocation of resources, foster collaboration \nwith a network\ |
|
\ of academic institutions and \nother stakeholders, and review the panel’s \n\ |
|
activities and deliverables.100 By drawing on the unique convening power of the\ |
|
\ \nUnited Nations and inclusive global reach across \nstakeholder groups, an\ |
|
\ international scientific panel \ncan deliver trusted scientific collaboration\ |
|
\ processes \nand outputs and correct information asymmetries \nin ways that address\ |
|
\ the representation and \ncoordination gaps identified in paragraphs 66 and \n\ |
|
73, thereby promoting equitable and effective \ninternational AI governance.\n\ |
|
Among the topics discussed in our consultations was the ongoing debate over open\ |
|
\ versus closed AI systems. \nAI systems that are open in varying degrees are\ |
|
\ often referred to as “open-source AI”, but this is somewhat of a \nmisnomer\ |
|
\ when compared with open-source software (code). It is important to recognize\ |
|
\ that openness in AI \nsystems is more of a spectrum than a single attribute.\n\ |
|
One article explained that a “fully closed AI system is only accessible to a particular\ |
|
\ group. It could be an AI \ndeveloper company or a specific group within it,\ |
|
\ mainly for internal research and development purposes. On the \nother hand,\ |
|
\ more open systems may allow public access or make available certain parts, such\ |
|
\ as data, code, or \nmodel characteristics, to facilitate external AI development.”a\n\ |
|
Open-source AI systems in the generative AI field present both risks and opportunities.\ |
|
\ Companies often cite “AI \nsafety” as a reason for not disclosing system specifications,\ |
|
\ reflecting the ongoing tension between open and \nclosed approaches in the industry.\ |
|
\ Debates typically revolve around two extremes: full openness, which entails\ |
|
\ \nsharing all model components and data sets; and partial openness, which involves\ |
|
\ disclosing only model weights. \nOpen-source AI systems encourage innovation\ |
|
\ and are often a requirement for public funding. On the open \nextreme of the\ |
|
\ spectrum, when the underlying code is made freely available, developers around\ |
|
\ the world can \nexperiment, improve and create new applications. This fosters\ |
|
\ a collaborative environment where ideas and \nexpertise are readily shared.\ |
|
\ Some industry leaders argue that this openness is vital to innovation and economic\ |
|
\ \ngrowth.\nHowever, in most cases, open-source AI models are available as application\ |
|
\ programming interfaces. In this case, \nthe original code is not shared, the\ |
|
\ original weights are never changed and model updates become new models. \nAdditionally,\ |
|
\ open-source models tend to be smaller and more transparent. This transparency\ |
|
\ can build trust, \nallow for ethical considerations to be proactively addressed,\ |
|
\ and support validation and replication because users \ncan examine the inner\ |
|
\ workings of the AI system, understand its decision-making process and identify\ |
|
\ potential \nbiases.Box 9: Open versus closed AI systems\na Angela Luna, “The\ |
|
\ open or closed AI dilemma”, 2 May 2024. Available at https://bipartisanpolicy.org/blog/the-open-or-closed-ai-dilemma\ |
|
\ .\n25 International Scientific Report on the Safety of Advanced AI: Interim\ |
|
\ Report. Available at https://gov.uk/government/publications/international-scientific-report-\n\ |
|
on-the-safety-of-advanced-ai ." |
|
- source_sentence: What role does the report propose for the United Nations in establishing |
|
a governance regime for AI, and how does it envision this regime contributing |
|
to a new social contract that protects vulnerable populations? |
|
sentences: |
|
- "HUMAN ALTERNATIVES, \nCONSIDERATION, AND \nFALLBACK \nHOW THESE PRINCIPLES CAN\ |
|
\ MOVE INTO PRACTICE\nReal-life examples of how these principles can become reality,\ |
|
\ through laws, policies, and practical \ntechnical and sociotechnical approaches\ |
|
\ to protecting rights, opportunities, and access. \nHealthcare “navigators” help\ |
|
\ people find their way through online signup forms to choose \nand obtain healthcare.\ |
|
\ A Navigator is “an individual or organization that's trained and able to help\ |
|
\ \nconsumers, small businesses, and their employees as they look for health coverage\ |
|
\ options through the \nMarketplace (a government web site), including completing\ |
|
\ eligibility and enrollment forms.”106 For \nthe 2022 plan year, the Biden-Harris\ |
|
\ Administration increased funding so that grantee organizations could \n“train\ |
|
\ and certify more than 1,500 Navigators to help uninsured consumers find affordable\ |
|
\ and comprehensive \nhealth coverage. ”107\nThe customer service industry has\ |
|
\ successfully integrated automated services such as \nchat-bots and AI-driven\ |
|
\ call response systems with escalation to a human support team.\n108 Many businesses\ |
|
\ now use partially automated customer service platforms that help answer customer\ |
|
\ \nquestions and compile common problems for human agents to review. These integrated\ |
|
\ human-AI \nsystems allow companies to provide faster customer care while maintaining\ |
|
\ human agents to answer \ncalls or otherwise respond to complicated requests.\ |
|
\ Using both AI and human agents is viewed as key to \nsuccessful customer service.109\n\ |
|
Ballot curing laws in at least 24 states require a fallback system that allows\ |
|
\ voters to \ncorrect their ballot and have it counted in the case that a voter\ |
|
\ signature matching algorithm incorrectly flags their ballot as invalid or there\ |
|
\ is another issue with their ballot, and review by an election official does\ |
|
\ not rectify the problem. Some federal courts have found that such cure procedures\ |
|
\ are constitutionally required.\n110 Ballot \ncuring processes vary among states,\ |
|
\ and include direct phone calls, emails, or mail contact by election \nofficials.111\ |
|
\ Voters are asked to provide alternative information or a new signature to verify\ |
|
\ the validity of their \nballot. \n52" |
|
- "SECTION TITLE\nHUMAN ALTERNATIVES , C ONSIDERATION , AND FALLBACK\nYou should\ |
|
\ be able to opt out, where appropriate, and have access to a person who can quickly\ |
|
\ \nconsider and remedy problems you encounter. You should be able to opt out\ |
|
\ from automated systems in \nfavor of a human alternative, where appropriate.\ |
|
\ Appropriateness should be determined based on reasonable expectations in a given\ |
|
\ context and with a focus on ensuring broad accessibility and protecting the\ |
|
\ public from especially harmful impacts. In some cases, a human or other alternative\ |
|
\ may be required by law. You should have access to timely human consideration\ |
|
\ and remedy by a fallback and escalation process if an automated system fails,\ |
|
\ it produces an error, or you would like to appeal or contest its impacts on\ |
|
\ you. Human consideration and fallback should be accessible, equitable, effective,\ |
|
\ maintained, accompanied by appropriate operator training, and should not impose\ |
|
\ an unreasonable burden on the public. Automated systems with an intended use\ |
|
\ within sensi\n-\ntive domains, including, but not limited to, criminal justice,\ |
|
\ employment, education, and health, should additional -\nly be tailored to the\ |
|
\ purpose, provide meaningful access for oversight, include training for any people\ |
|
\ interacting with the system, and incorporate human consideration for adverse\ |
|
\ or high-risk decisions. Reporting that includes a description of these human\ |
|
\ governance processes and assessment of their timeliness, accessibility, outcomes,\ |
|
\ and effectiveness should be made public whenever possible. \nDefinitions for\ |
|
\ key terms in The Blueprint for an AI Bill of Rights can be found in Applying\ |
|
\ the Blueprint for an AI Bill of Rights. \nAccompanying analysis and tools for\ |
|
\ actualizing each principle can be found in the Technical Companion. \n7" |
|
- "Final Report 21E. Reflections on institutional \nmodels\nlxiv Discussions\ |
|
\ about AI often resolve into extremes. \nIn our consultations around the world,\ |
|
\ we engaged \nwith those who see a future of boundless goods \nprovided by ever-cheaper,\ |
|
\ ever-more-helpful AI \nsystems. We also spoke with those wary of darker \nfutures,\ |
|
\ of division and unemployment, and even \nextinction.8\nlxv We do not know whether\ |
|
\ the utopian or dystopian \nfuture is more likely. Equally, we are mindful that\ |
|
\ \nthe technology may go in a direction that does \naway with this duality. This\ |
|
\ report focuses on \nthe near-term opportunities and risks, based on \nscience\ |
|
\ and grounded in fact. \nlxvi The seven recommendations outlined above offer\ |
|
\ \nour best hope for reaping the benefits of AI, while \nminimizing and mitigating\ |
|
\ the risks, as AI continues \nevolving. We are also mindful of the practical\ |
|
\ \nchallenges to international institution-building \non a larger scale. This\ |
|
\ is why we are proposing a \nnetworked institutional approach, with light and\ |
|
\ \nagile support. If or when risks become more acute \nand the stakes for opportunities\ |
|
\ escalate, such \ncalculations may change. \nlxvii The world wars led to the\ |
|
\ modern international \nsystem; the development of ever-more-powerful \nchemical,\ |
|
\ biological and nuclear weapons led \nto regimes limiting their spread and promoting\ |
|
\ \npeaceful uses of the underlying technologies. \nEvolving understanding of\ |
|
\ our common humanity \nled to the modern human rights system and our \nongoing\ |
|
\ commitment to the SDGs for all. Climate \nchange evolved from a niche concern\ |
|
\ to a global \nchallenge.lxviii AI may similarly rise to a level that requires\ |
|
\ more \nresources and more authority than is proposed \nin the above-mentioned\ |
|
\ recommendations, \ninto harder functions of norm elaboration, \nimplementation,\ |
|
\ monitoring, verification and \nvalidation, enforcement, accountability, remedies\ |
|
\ \nfor harm and emergency responses. Reflecting on \nsuch institutional models,\ |
|
\ therefore, is prudent. The \nfinal section of this report seeks to contribute\ |
|
\ to \nthat effort.\n4. A call to action\nlxix We remain optimistic about the\ |
|
\ future with AI and \nits positive potential. That optimism depends, \nhowever,\ |
|
\ on realism about the risks and the \ninadequacy of structures and incentives\ |
|
\ currently \nin place. The technology is too important, and the \nstakes are\ |
|
\ too high, to rely only on market forces \nand a fragmented patchwork of national\ |
|
\ and \nmultilateral action.\nlxx The United Nations can be the vehicle for a\ |
|
\ new \nsocial contract for AI that ensures global buy-\nin for a governance regime\ |
|
\ which protects and \nempowers us all. Such a social contract will ensure \n\ |
|
that opportunities are fairly distributed, and the \nrisks are not loaded on to\ |
|
\ the most vulnerable – or \npassed on to future generations, as we have seen,\ |
|
\ \ntragically, with climate change.\nlxxi As a group and as individuals from\ |
|
\ across many \nfields of expertise, organizations and parts of the \nworld, we\ |
|
\ look forward to continuing this crucial \nconversation. Together with the many\ |
|
\ others we \nhave connected with on this journey, and the global \ncommunity\ |
|
\ they represent, we hope that this report \ncontributes to our combined efforts\ |
|
\ to govern AI \nfor humanity.\n8 See https://safe.ai/work/statement-on-ai-risk\ |
|
\ ." |
|
- source_sentence: What are the potential consequences of coordination gaps between |
|
various AI governance initiatives, as highlighted in the context information? |
|
sentences: |
|
- "44 Governing AI for Humanity B. Coordination gaps\n72 The ongoing emergence\ |
|
\ and evolution of AI \ngovernance initiatives are not guaranteed to \nwork together\ |
|
\ effectively for humanity. Instead, \ncoordination gaps have appeared. Effective\ |
|
\ \nhandshaking between the selective plurilateral \ninitiatives (see fig. 8)\ |
|
\ and other regional initiatives is \nnot assured, risking incompatibility between\ |
|
\ regions.\n73 Nor are there global mechanisms for all international \nstandards\ |
|
\ development organizations (see fig. 7), \ninternational scientific research\ |
|
\ initiatives or AI \ncapacity-building initiatives to coordinate with each \n\ |
|
other, undermining interoperability of approaches \nand resulting in fragmentation.\ |
|
\ The resulting \ncoordination gaps between various sub-global \ninitiatives are\ |
|
\ in some cases best addressed at the \nglobal level.\n74 A separate set of coordination\ |
|
\ gaps arise within \nthe United Nations system, reflected in the array of \n\ |
|
diverse United Nations documents and initiatives \nin relation to AI. Figure 9\ |
|
\ shows 27 United Nations-\nrelated instruments in specific domains that may \n\ |
|
apply to AI – 23 of them are binding and will require \ninterpretation as they\ |
|
\ pertain to AI. A further 29 \ndomain-level documents from the United Nations\ |
|
\ \nand related organizations focus specifically on AI, \nnone of which are binding.17\ |
|
\ In some cases, these \ncan address AI risks and harness AI benefits in \nspecific\ |
|
\ domains.75 The level of activity shows the importance of AI \nto United Nations\ |
|
\ programmes. As AI expands to \naffect ever-wider aspects of society, there will\ |
|
\ be \ngrowing calls for diverse parts of the United Nations \nsystem to act,\ |
|
\ including through binding norms. \nIt also shows the ad hoc nature of the responses,\ |
|
\ \nwhich have largely developed organically in specific \ndomains and without\ |
|
\ an overarching strategy. The \nresulting coordination gaps invite overlaps and\ |
|
\ \nhinder interoperability and impact.\n76 The number and diversity of approaches\ |
|
\ are a sign \nthat the United Nations system is responding to \nan emerging issue.\ |
|
\ With proper orchestration, and \nin combination with processes taking a holistic\ |
|
\ \napproach, these efforts can offer an efficient and \nsustainable pathway to\ |
|
\ inclusive international AI \ngovernance in specific domains. This could enable\ |
|
\ \nmeaningful, harmonized and coordinated impacts \non areas such as health,\ |
|
\ education, technical \nstandards and ethics, instead of merely contributing\ |
|
\ \nto the proliferation of initiatives and institutions \nin this growing field.\ |
|
\ International law, including \ninternational human rights law, provides a shared\ |
|
\ \nnormative foundation for all AI-related efforts, \nthereby facilitating coordination\ |
|
\ and coherence." |
|
- "\0\0 Issued a comprehensive plan for U.S. engagement on global AI\nstandards. The\ |
|
\ plan, developed by the NIST, incorporates broad public\nand private-sector input,\ |
|
\ identifies objectives and priority areas for AI\nstandards work, and lays out\ |
|
\ actions for U.S. stakeholders including U.S.\nagencies. NIST and others agencies\ |
|
\ will report on priority actions in 180\ndays. \n\0\0 Developed for managing\ |
|
\ risks to human rights posed by AI.\nThe Department of State’s “Risk Management\ |
|
\ Profile for AI and Human\nRights”—developed in close coordination with NIST and\ |
|
\ the U.S. Agency\nfor International Development—recommends actions based on the\ |
|
\ NIST\nAI Risk Management Framework to governments, the private sector, and\n\ |
|
civil society worldwide, to identify and manage risks to human rights\narising\ |
|
\ from the design, development, deployment, and use of AI. \n\0\0 Launched a global\ |
|
\ network of AI Safety Institutes and other\ngovernment-backed scientific offices\ |
|
\ to advance AI safety at a technical\nlevel. This network will accelerate critical\ |
|
\ information exchange and\ndrive toward common or compatible safety evaluations\ |
|
\ and policies.\n\0\0 Launched a landmark United Nations General Assembly resolution.\n\ |
|
The unanimously adopted resolution, with more than 100 co-sponsors,\nlays out\ |
|
\ a common vision for countries around the world to promote the\nsafe and secure\ |
|
\ use of AI to address global challenges.\n\0\0 Expanded global support for the\ |
|
\ U.S.-led Political Declaration on the\nResponsible Military Use of Artificial\ |
|
\ Intelligence and\nAutonomy. Fifty-five nations now endorse the political declaration,\n\ |
|
which outlines a set of norms for the responsible development,\ndeployment, and\ |
|
\ use of military AI capabilities.\nThe Table below summarizes many of the activities\ |
|
\ that federal agencies\nhave completed in response to the Executive Order:guidance05/10/2024,\ |
|
\ 16:35 FACT SHEET: Biden-Harris Administration Announces New AI Actions and Receives\ |
|
\ Additional Major Voluntary Commitment on AI | The…\nhttps://www.whitehouse.gov/briefing-room/statements-releases/2024/07/26/fact-sheet-biden-harris-administration-announces-new-ai-actions-and-receives-addit…\ |
|
\ 5/10" |
|
- "Final Report 55f. In addition, diverse stakeholders – in particular \ntechnology\ |
|
\ companies and civil society \nrepresentatives – could be invited to engage \n\ |
|
through existing institutions detailed below, as \nwell as policy workshops on\ |
|
\ particular aspects \nof AI governance such as limits (if any) of open-\nsource\ |
|
\ approaches to the most advanced forms \nof AI, thresholds for tracking and reporting\ |
|
\ of \nAI incidents, application of human rights law to \nnovel use cases, or\ |
|
\ the use of competition law/\nantitrust to address concentrations of power \n\ |
|
among technology companies.30\ng. The proposed AI office could also curate a \n\ |
|
repository of AI governance examples, including \nlegislation, policies and institutions\ |
|
\ from \naround the world for consideration of the policy \ndialogue, working\ |
|
\ with existing efforts, such as \nOECD.\n109 Notwithstanding the two General\ |
|
\ Assembly \nresolutions on AI in 2024, there is currently \nno mandated institutionalized\ |
|
\ dialogue on \nAI governance at the United Nations that \ncorresponds to the\ |
|
\ reliably inclusive vision of this \nrecommendation. Similar processes do exist\ |
|
\ at \nthe international level, but primarily in regional or \nplurilateral constellations\ |
|
\ (para. 57), which are not \nreliably inclusive and global.\n110 Complementing\ |
|
\ a fluid process of plurilateral and \nregional AI summits,31 the United Nations\ |
|
\ can \noffer a stable home for dialogue on AI governance. \nInclusion by design\ |
|
\ – a crucial requirement for \nplaying a stabilizing role in geopolitically delicate\ |
|
\ \ntimes – can also address representation and \ncoordination gaps identified\ |
|
\ in paragraphs 64 and \n72, promoting more effective collective action on AI\ |
|
\ \ngovernance in the common interest of all countries. AI standards exchange\ |
|
\ \n \nRecommendation 3: AI standards exchange \n \nWe recommend the creation\ |
|
\ of an AI standards \nexchange, bringing together representatives from \nnational\ |
|
\ and international standard-development \norganizations, technology companies,\ |
|
\ civil society \nand representatives from the international scientific \npanel.\ |
|
\ It would be tasked with:\na. Developing and maintaining a register of \ndefinitions\ |
|
\ and applicable standards for \nmeasuring and evaluating AI systems;\nb. Debating\ |
|
\ and evaluating the standards and the \nprocesses for creating them; and\nc.\ |
|
\ Identifying gaps where new standards are \nneeded.\n111 When AI systems were\ |
|
\ first explored, few standards \nexisted to help to navigate or measure this\ |
|
\ new \nfrontier. The Turing Test – of whether a machine can \nexhibit behaviour\ |
|
\ equivalent to (or indistinguishable \nfrom) a human being – captured the popular\ |
|
\ \nimagination, but is of more cultural than scientific \nsignificance. Indeed,\ |
|
\ it is telling that some of \nthe greatest computational advances have been \n\ |
|
measured by their success in games, such as when \na computer could beat humans\ |
|
\ at chess, Go, poker \nor Jeopardy. Such measures were easily understood \nby\ |
|
\ non-specialists, but were neither rigorous nor \nparticularly scientific.\n\ |
|
112 More recently, there has been a proliferation of \nstandards. Figure 13 illustrates\ |
|
\ the increasing \nnumber of relevant standards adopted by ITU, the \nInternational\ |
|
\ Organization for Standardization (ISO), \nthe International Electrotechnical\ |
|
\ Commission \n(IEC) and the Institute of Electrical and Electronics \nEngineers\ |
|
\ (IEEE).32\n30 Such a gathering could also provide an opportunity for multi-stakeholder\ |
|
\ debate of any hardening of the global governance of AI. These might include,\ |
|
\ for \nexample, prohibitions on the development of uncontainable or uncontrollable\ |
|
\ AI systems, or requirements that all AI systems be sufficiently transparent\ |
|
\ so that \ntheir consequences can be traced back to a legal actor that can assume\ |
|
\ responsibility for them.\n31 Although multiple AI summits have helped a subset\ |
|
\ of 20–30 countries to align on AI safety issues, participation has been inconsistent:\ |
|
\ Brazil, China and \nIreland endorsed the Bletchley Declaration in November 2023,\ |
|
\ but not the Seoul Ministerial Statement six months later (see fig. 12). Conversely,\ |
|
\ Mexico and \nNew Zealand endorsed the Seoul Ministerial Statement, but did not\ |
|
\ endorse the Bletchley Declaration.\n32 Many new standards are also emerging\ |
|
\ at the national and multinational levels, such as the United States White House\ |
|
\ Voluntary AI Commitments and the \nEuropean Union Codes of Practice for the\ |
|
\ AI Act." |
|
- source_sentence: Describe the minimum set of criteria that should be included in |
|
the incident reporting process for GAI systems, according to the organizational |
|
practices established for identifying incidents. |
|
sentences: |
|
- "APPENDIX\nSummaries of Additional Engagements: \n•OSTP created an email address\ |
|
\ ( [email protected] ) to solicit comments from the public on the use of\n\ |
|
artificial intelligence and other data-driven technologies in their lives.\n•OSTP\ |
|
\ issued a Request For Information (RFI) on the use and governance of biometric\ |
|
\ technologies.113 The\npurpose of this RFI was to understand the extent and variety\ |
|
\ of biometric technologies in past, current, or\nplanned use; the domains in\ |
|
\ which these technologies are being used; the entities making use of them; currentprinciples,\ |
|
\ practices, or policies governing their use; and the stakeholders that are, or\ |
|
\ may be, impacted by theiruse or regulation. The 130 responses to this RFI are\ |
|
\ available in full online\n114 and were submitted by the below\nlisted organizations\ |
|
\ and individuals:\nAccenture \nAccess Now ACT | The App Association AHIP \nAIethicist.org\ |
|
\ \nAirlines for America Alliance for Automotive Innovation Amelia Winger-Bearskin\ |
|
\ American Civil Liberties Union American Civil Liberties Union of Massachusetts\ |
|
\ American Medical Association ARTICLE19 Attorneys General of the District of\ |
|
\ Columbia, Illinois, Maryland, Michigan, Minnesota, New York, North Carolina,\ |
|
\ Oregon, Vermont, and Washington Avanade Aware Barbara Evans Better Identity\ |
|
\ Coalition Bipartisan Policy Center Brandon L. Garrett and Cynthia Rudin Brian\ |
|
\ Krupp Brooklyn Defender Services BSA | The Software Alliance Carnegie Mellon\ |
|
\ University Center for Democracy & Technology Center for New Democratic Processes\ |
|
\ Center for Research and Education on Accessible Technology and Experiences at\ |
|
\ University of Washington, Devva Kasnitz, L Jean Camp, Jonathan Lazar, Harry\ |
|
\ Hochheiser Center on Privacy & Technology at Georgetown Law Cisco Systems City\ |
|
\ of Portland Smart City PDX Program CLEAR Clearview AI Cognoa Color of Change\ |
|
\ Common Sense Media Computing Community Consortium at Computing Research Association\ |
|
\ Connected Health Initiative Consumer Technology Association Courtney Radsch\ |
|
\ Coworker Cyber Farm Labs Data & Society Research Institute Data for Black Lives\ |
|
\ Data to Actionable Knowledge Lab at Harvard University Deloitte Dev Technology\ |
|
\ Group Digital Therapeutics Alliance Digital Welfare State & Human Rights Project\ |
|
\ and Center for Human Rights and Global Justice at New York University School\ |
|
\ of Law, and Temple University Institute for Law, Innovation & Technology Dignari\ |
|
\ Douglas Goddard Edgar Dworsky Electronic Frontier Foundation Electronic Privacy\ |
|
\ Information Center, Center for Digital Democracy, and Consumer Federation of\ |
|
\ America FaceTec Fight for the Future Ganesh Mani Georgia Tech Research Institute\ |
|
\ Google Health Information Technology Research and Development Interagency Working\ |
|
\ Group HireVue HR Policy Association ID.me Identity and Data Sciences Laboratory\ |
|
\ at Science Applications International Corporation Information Technology and\ |
|
\ Innovation Foundation Information Technology Industry Council Innocence Project\ |
|
\ Institute for Human-Centered Artificial Intelligence at Stanford University\ |
|
\ Integrated Justice Information Systems Institute International Association of\ |
|
\ Chiefs of Police International Biometrics + Identity Association International\ |
|
\ Business Machines Corporation International Committee of the Red Cross Inventionphysics\ |
|
\ iProov Jacob Boudreau Jennifer K. Wagner, Dan Berger, Margaret Hu, and Sara\ |
|
\ Katsanis Jonathan Barry-Blocker Joseph Turow Joy Buolamwini Joy Mack Karen Bureau\ |
|
\ Lamont Gholston Lawyers’ Committee for Civil Rights Under Law \n60" |
|
- "19 GV-4.1-003 Establish policies, procedures, and processes for oversight functions\ |
|
\ (e.g., senior \nleadership, legal, compliance, including internal evaluation\ |
|
\ ) across the GAI \nlifecycle, from problem formulation and supply chains to\ |
|
\ system decommission. Value Chain and Component \nIntegration \nAI Actor Tasks:\ |
|
\ AI Deployment, AI Design, AI Development, Operation and Monitoring \n \nGOVERN\ |
|
\ 4.2: Organizational teams document the risks and potential impacts of the AI\ |
|
\ technology they design, develop, deploy, \nevaluate, and use, and they communicate\ |
|
\ about the impacts more broadly. \nAction ID Suggested Action GAI Risks \n\ |
|
GV-4.2-001 Establish terms of use and terms of service for GAI systems . Intellectual\ |
|
\ Property ; Dangerous , \nViolent, or Hateful Content ; \nObscene, Degrading,\ |
|
\ and/or \nAbusive Content \nGV-4.2-002 Include relevant AI Actors in the GAI\ |
|
\ system risk identification process. Human -AI Configuration \nGV-4.2-0 03 Verify\ |
|
\ that downstream GAI system impacts (such as the use of third -party \nplugins)\ |
|
\ are included in the impact documentation process. Value Chain and Component\ |
|
\ \nIntegration \nAI Actor Tasks: AI Deployment, AI Design, AI Development,\ |
|
\ Operation and Monitoring \n \nGOVERN 4.3: Organizational practices are in place\ |
|
\ to enable AI testing, identification of incidents, and information sharing. \ |
|
\ \nAction ID Suggested Action GAI Risks \nGV4.3-- 001 Establish policies for\ |
|
\ measuring the effectiveness of employed content \nprovenance methodologies (e.g.,\ |
|
\ cryptography, watermarking, steganography, etc.) Information Integrity \nGV-4.3-002\ |
|
\ Establish o rganizational practices to identify the minimum set of criteria\ |
|
\ \nnecessary for GAI system incident reporting such as: System ID (auto -generated\ |
|
\ \nmost likely), Title, Reporter, System/Source, Data Reported, Date of Incident,\ |
|
\ Description, Impact(s), Stakeholder(s) Impacted. Information Security" |
|
- "72 Governing AI for Humanity Box 15: Possible functions and first-year deliverables\ |
|
\ of the AI office\nThe AI office should have a light structure and aim to be\ |
|
\ agile, trusted and networked. Where necessary, it should \noperate in a “hub\ |
|
\ and spoke” manner to connect to other parts of the United Nations system and\ |
|
\ beyond.\nOutreach could include serving as a key node in a so-called soft coordination\ |
|
\ architecture between Member \nStates, plurilateral networks, civil society organizations,\ |
|
\ academia and technology companies in a regime complex \nthat weaves together\ |
|
\ to solve problems collaboratively through networking, and as a safe, trusted\ |
|
\ place to \nconvene on relevant topics. Ambitiously, it could become the glue\ |
|
\ that helps to hold such other evolving networks \ntogether.\nSupporting the\ |
|
\ various initiatives proposed in this report includes the important function\ |
|
\ of ensuring inclusiveness \nat speed in delivering outputs such as scientific\ |
|
\ reports, governance dialogue and identifying appropriate follow-\nup entities.\n\ |
|
Common understanding :\n• Facilitate recruitment of and support the international\ |
|
\ scientific panel.\nCommon ground :\n• Service policy dialogues with multi-stakeholder\ |
|
\ inputs in support of interoperability and policy learning. \nAn initial priority\ |
|
\ topic is the articulation of risk thresholds and safety frameworks across jurisdictions\n\ |
|
• Support ITU, ISO/IEC and IEEE on setting up the AI standards exchange.\nCommon\ |
|
\ benefits :\n• Support the AI capacity development network with an initial focus\ |
|
\ on building public interest AI capacity \namong public officials and social\ |
|
\ entrepreneurs. Define the initial network vision, outcomes, go vernance \nstructure,\ |
|
\ partnerships and operational mechanisms.\n• Define the vision, outcomes, governance\ |
|
\ structure and operational mechanisms for the global fund for AI, \nand seek\ |
|
\ feedback from Member States, industry and civil society stakeholders on the\ |
|
\ proposal, with a \nview to funding initial projects within six months of establishment.\n\ |
|
• Prepare and publish an annual list of prioritized investment areas to guide\ |
|
\ both the global fund for AI and \ninvestments outside that structure.\nCoherent\ |
|
\ effort :\n• Establish lightweight mechanisms that support Member States and\ |
|
\ other relevant organizations to be \nmore connected, coordinated and effective\ |
|
\ in pursuing their global AI governance efforts.\n• Prepare initial frameworks\ |
|
\ to guide and monitor the AI office’s work, including a global governance risk\ |
|
\ \ntaxonomy, a global AI policy landscape review and a global stakeholder map.\n\ |
|
• Develop and implement quarterly reporting and periodic in-person presentations\ |
|
\ to Member States on \nthe AI office’s progress against its workplan and establish\ |
|
\ feedback channels to support adjustments as \nneeded.\n• Establish a steering\ |
|
\ committee jointly led by the AI office, ITU, UNC TAD, UNESCO and other relevant\ |
|
\ \nUnited Nations entities and organizations to accelerate the work of the United\ |
|
\ Nations in service of the \nfunctions above, and review progress of the accelerated\ |
|
\ efforts every three months.\n• Promote joint learning and development opportunities\ |
|
\ for Member State representatives to support them \nto carry out their responsibilities\ |
|
\ for global AI governance, in cooperation with relevant United Nations \nentities\ |
|
\ and organizations such as the United Nations Institute for Training and Research\ |
|
\ and the United \nNations University." |
|
- source_sentence: What are some of the legal frameworks mentioned in the context |
|
that aim to protect personal information, and how do they relate to data privacy |
|
concerns? |
|
sentences: |
|
- "NOTICE & \nEXPLANATION \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations\ |
|
\ for automated systems are meant to serve as a blueprint for the development\ |
|
\ of additional \ntechnical standards and practices that are tailored for particular\ |
|
\ sectors and contexts. \nTailored to the level of risk. An assessment should\ |
|
\ be done to determine the level of risk of the auto -\nmated system. In settings\ |
|
\ where the consequences are high as determined by a risk assessment, or extensive\ |
|
\ \noversight is expected (e.g., in criminal justice or some public sector settings),\ |
|
\ explanatory mechanisms should be built into the system design so that the system’s\ |
|
\ full behavior can be explained in advance (i.e., only fully transparent models\ |
|
\ should be used), rather than as an after-the-decision interpretation. In other\ |
|
\ settings, the extent of explanation provided should be tailored to the risk\ |
|
\ level. \nValid. The explanation provided by a system should accurately reflect\ |
|
\ the factors and the influences that led \nto a particular decision, and should\ |
|
\ be meaningful for the particular customization based on purpose, target, and\ |
|
\ level of risk. While approximation and simplification may be necessary for the\ |
|
\ system to succeed based on the explanatory purpose and target of the explanation,\ |
|
\ or to account for the risk of fraud or other concerns related to revealing decision-making\ |
|
\ information, such simplifications should be done in a scientifically supportable\ |
|
\ way. Where appropriate based on the explanatory system, error ranges for the\ |
|
\ explanation should be calculated and included in the explanation, with the choice\ |
|
\ of presentation of such information balanced with usability and overall interface\ |
|
\ complexity concerns. \nDemonstrate protections for notice and explanation \n\ |
|
Reporting. Summary reporting should document the determinations made based on\ |
|
\ the above consider -\nations, including: the responsible entities for accountability\ |
|
\ purposes; the goal and use cases for the system, identified users, and impacted\ |
|
\ populations; the assessment of notice clarity and timeliness; the assessment\ |
|
\ of the explanation's validity and accessibility; the assessment of the level\ |
|
\ of risk; and the account and assessment of how explanations are tailored, including\ |
|
\ to the purpose, the recipient of the explanation, and the level of risk. Individualized\ |
|
\ profile information should be made readily available to the greatest extent\ |
|
\ possible that includes explanations for any system impacts or inferences. Reporting\ |
|
\ should be provided in a clear plain language and machine-readable manner. \n\ |
|
44" |
|
- "25 MP-2.3-002 Review and document accuracy, representativeness, relevance, suitability\ |
|
\ of data \nused at different stages of AI life cycle. Harmful Bias and Homogenization\ |
|
\ ; \nIntellectual Property \nMP-2.3-003 Deploy and document fact -checking techniques\ |
|
\ to verify the accuracy and \nveracity of information generated by GAI systems,\ |
|
\ especially when the \ninformation comes from multiple (or unknown) sources.\ |
|
\ Information Integrity \nMP-2.3-004 Develop and implement testing techniques\ |
|
\ to identify GAI produced content (e.g., synthetic media) that might be indistinguishable\ |
|
\ from human -generated content. Information Integrity \nMP-2.3-005 Implement\ |
|
\ plans for GAI systems to undergo regular adversarial testing to identify \n\ |
|
vulnerabilities and potential manipulation or misuse. Information Security \n\ |
|
AI Actor Tasks: AI Development, Domain Experts, TEVV \n \nMAP 3.4: Processes\ |
|
\ for operator and practitioner proficiency with AI system performance and trustworthiness\ |
|
\ – and relevant \ntechnical standards and certifications – are defined, assessed,\ |
|
\ and documented. \nAction ID Suggested Action GAI Risks \nMP-3.4-001 Evaluate\ |
|
\ whether GAI operators and end -users can accurately understand \ncontent lineage\ |
|
\ and origin. Human -AI Configuration ; \nInformation Integrity \nMP-3.4-002\ |
|
\ Adapt existing training programs to include modules on digital content \ntransparency.\ |
|
\ Information Integrity \nMP-3.4-003 Develop certification programs that test\ |
|
\ proficiency in managing GAI risks and \ninterpreting content provenance, relevant\ |
|
\ to specific industry and context. Information Integrity \nMP-3.4-004 Delineate\ |
|
\ human proficiency tests from tests of GAI capabilities. Human -AI Configuration\ |
|
\ \nMP-3.4-005 Implement systems to continually monitor and track the outcomes\ |
|
\ of human- GAI \nconfigurations for future refinement and improvements . Human\ |
|
\ -AI Configuration ; \nInformation Integrity \nMP-3.4-006 Involve the end -users,\ |
|
\ practitioners, and operators in GAI system in prototyping \nand testing activities.\ |
|
\ Make sure these tests cover various scenarios , such as crisis \nsituations\ |
|
\ or ethically sensitive contexts. Human -AI Configuration ; \nInformation Integrity\ |
|
\ ; Harmful Bias \nand Homogenization ; Dangerous , \nViolent, or Hateful Content\ |
|
\ \nAI Actor Tasks: AI Design, AI Development, Domain Experts, End -Users, Human\ |
|
\ Factors, Operation and Monitoring" |
|
- '65. See, e.g., Scott Ikeda. Major Data Broker Exposes 235 Million Social Media |
|
Profiles in Data Lead: Info |
|
|
|
Appears to Have Been Scraped Without Permission. CPO Magazine. Aug. 28, 2020. |
|
https:// |
|
|
|
www.cpomagazine.com/cyber-security/major-data-broker-exposes-235-million-social-media-profiles- |
|
|
|
in-data-leak/; Lily Hay Newman. 1.2 Billion Records Found Exposed Online in a |
|
Single Server . WIRED, |
|
|
|
Nov. 22, 2019. https://www.wired.com/story/billion-records-exposed-online/ |
|
|
|
66.Lola Fadulu. Facial Recognition Technology in Public Housing Prompts Backlash |
|
. New York Times. |
|
|
|
Sept. 24, 2019. |
|
|
|
https://www.nytimes.com/2019/09/24/us/politics/facial-recognition-technology-housing.html |
|
|
|
67. Jo Constantz. ‘They Were Spying On Us’: Amazon, Walmart, Use Surveillance |
|
Technology to Bust |
|
|
|
Unions. Newsweek. Dec. 13, 2021. |
|
|
|
https://www.newsweek.com/they-were-spying-us-amazon-walmart-use-surveillance-technology-bust- |
|
|
|
unions-1658603 |
|
|
|
68. See, e.g., enforcement actions by the FTC against the photo storage app Everalbaum |
|
|
|
(https://www.ftc.gov/legal-library/browse/cases-proceedings/192-3172-everalbum-inc-matter), |
|
and |
|
|
|
against Weight Watchers and their subsidiary Kurbo(https://www.ftc.gov/legal-library/browse/cases-proceedings/1923228-weight-watchersww) |
|
|
|
69. See, e.g., HIPAA, Pub. L 104-191 (1996); Fair Debt Collection Practices Act |
|
(FDCPA), Pub. L. 95-109 |
|
|
|
(1977); Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. § 1232g), |
|
Children''s Online |
|
|
|
Privacy Protection Act of 1998, 15 U.S.C. 6501–6505, and Confidential Information |
|
Protection andStatistical Efficiency Act (CIPSEA) (116 Stat. 2899) |
|
|
|
70. Marshall Allen. You Snooze, You Lose: Insurers Make The Old Adage Literally |
|
True . ProPublica. Nov. |
|
|
|
21, 2018. |
|
|
|
https://www.propublica.org/article/you-snooze-you-lose-insurers-make-the-old-adage-literally-true |
|
|
|
71.Charles Duhigg. How Companies Learn Your Secrets. The New York Times. Feb. |
|
16, 2012. |
|
|
|
https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html72. Jack Gillum |
|
and Jeff Kao. Aggression Detectors: The Unproven, Invasive Surveillance Technology |
|
|
|
Schools are Using to Monitor Students. ProPublica. Jun. 25, 2019. |
|
|
|
https://features.propublica.org/aggression-detector/the-unproven-invasive-surveillance-technology- |
|
|
|
schools-are-using-to-monitor-students/ |
|
|
|
73.Drew Harwell. Cheating-detection companies made millions during the pandemic. |
|
Now students are |
|
|
|
fighting back. Washington Post. Nov. 12, 2020. |
|
|
|
https://www.washingtonpost.com/technology/2020/11/12/test-monitoring-student-revolt/ |
|
|
|
74. See, e.g., Heather Morrison. Virtual Testing Puts Disabled Students at a Disadvantage. |
|
Government |
|
|
|
Technology. May 24, 2022. |
|
|
|
https://www.govtech.com/education/k-12/virtual-testing-puts-disabled-students-at-a-disadvantage; |
|
|
|
Lydia X. Z. Brown, Ridhi Shetty, Matt Scherer, and Andrew Crawford. Ableism And |
|
Disability |
|
|
|
Discrimination In New Surveillance Technologies: How new surveillance technologies |
|
in education, |
|
|
|
policing, health care, and the workplace disproportionately harm disabled people |
|
. Center for Democracy |
|
|
|
and Technology Report. May 24, 2022.https://cdt.org/insights/ableism-and-disability-discrimination-in-new-surveillance-technologies-how-new-surveillance-technologies-in-education-policing-health-care-and-the-workplace-disproportionately-harm-disabled-people/ |
|
|
|
69' |
|
model-index: |
|
- name: SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5 |
|
results: |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: Unknown |
|
type: unknown |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.71875 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.921875 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.96875 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 1.0 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.71875 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.30729166666666663 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19374999999999998 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09999999999999999 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.71875 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.921875 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.96875 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 1.0 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.8727659974381962 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.8304687500000002 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.8304687500000001 |
|
name: Cosine Map@100 |
|
- type: dot_accuracy@1 |
|
value: 0.734375 |
|
name: Dot Accuracy@1 |
|
- type: dot_accuracy@3 |
|
value: 0.921875 |
|
name: Dot Accuracy@3 |
|
- type: dot_accuracy@5 |
|
value: 0.96875 |
|
name: Dot Accuracy@5 |
|
- type: dot_accuracy@10 |
|
value: 1.0 |
|
name: Dot Accuracy@10 |
|
- type: dot_precision@1 |
|
value: 0.734375 |
|
name: Dot Precision@1 |
|
- type: dot_precision@3 |
|
value: 0.30729166666666663 |
|
name: Dot Precision@3 |
|
- type: dot_precision@5 |
|
value: 0.19374999999999998 |
|
name: Dot Precision@5 |
|
- type: dot_precision@10 |
|
value: 0.09999999999999999 |
|
name: Dot Precision@10 |
|
- type: dot_recall@1 |
|
value: 0.734375 |
|
name: Dot Recall@1 |
|
- type: dot_recall@3 |
|
value: 0.921875 |
|
name: Dot Recall@3 |
|
- type: dot_recall@5 |
|
value: 0.96875 |
|
name: Dot Recall@5 |
|
- type: dot_recall@10 |
|
value: 1.0 |
|
name: Dot Recall@10 |
|
- type: dot_ndcg@10 |
|
value: 0.8785327200386421 |
|
name: Dot Ndcg@10 |
|
- type: dot_mrr@10 |
|
value: 0.8382812500000002 |
|
name: Dot Mrr@10 |
|
- type: dot_map@100 |
|
value: 0.8382812500000001 |
|
name: Dot Map@100 |
|
--- |
|
|
|
# SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5 |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) <!-- at revision 104333d6af6f97649377c2afbde10a7704870c7b --> |
|
- **Maximum Sequence Length:** 8192 tokens |
|
- **Output Dimensionality:** 1024 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel |
|
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("sentence_transformers_model_id") |
|
# Run inference |
|
sentences = [ |
|
'What are some of the legal frameworks mentioned in the context that aim to protect personal information, and how do they relate to data privacy concerns?', |
|
"65. See, e.g., Scott Ikeda. Major Data Broker Exposes 235 Million Social Media Profiles in Data Lead: Info\nAppears to Have Been Scraped Without Permission. CPO Magazine. Aug. 28, 2020. https://\nwww.cpomagazine.com/cyber-security/major-data-broker-exposes-235-million-social-media-profiles-\nin-data-leak/; Lily Hay Newman. 1.2 Billion Records Found Exposed Online in a Single Server . WIRED,\nNov. 22, 2019. https://www.wired.com/story/billion-records-exposed-online/\n66.Lola Fadulu. Facial Recognition Technology in Public Housing Prompts Backlash . New York Times.\nSept. 24, 2019.\nhttps://www.nytimes.com/2019/09/24/us/politics/facial-recognition-technology-housing.html\n67. Jo Constantz. ‘They Were Spying On Us’: Amazon, Walmart, Use Surveillance Technology to Bust\nUnions. Newsweek. Dec. 13, 2021.\nhttps://www.newsweek.com/they-were-spying-us-amazon-walmart-use-surveillance-technology-bust-\nunions-1658603\n68. See, e.g., enforcement actions by the FTC against the photo storage app Everalbaum\n(https://www.ftc.gov/legal-library/browse/cases-proceedings/192-3172-everalbum-inc-matter), and\nagainst Weight Watchers and their subsidiary Kurbo(https://www.ftc.gov/legal-library/browse/cases-proceedings/1923228-weight-watchersww)\n69. See, e.g., HIPAA, Pub. L 104-191 (1996); Fair Debt Collection Practices Act (FDCPA), Pub. L. 95-109\n(1977); Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. § 1232g), Children's Online\nPrivacy Protection Act of 1998, 15 U.S.C. 6501–6505, and Confidential Information Protection andStatistical Efficiency Act (CIPSEA) (116 Stat. 2899)\n70. Marshall Allen. You Snooze, You Lose: Insurers Make The Old Adage Literally True . ProPublica. Nov.\n21, 2018.\nhttps://www.propublica.org/article/you-snooze-you-lose-insurers-make-the-old-adage-literally-true\n71.Charles Duhigg. How Companies Learn Your Secrets. The New York Times. Feb. 16, 2012.\nhttps://www.nytimes.com/2012/02/19/magazine/shopping-habits.html72. Jack Gillum and Jeff Kao. Aggression Detectors: The Unproven, Invasive Surveillance Technology\nSchools are Using to Monitor Students. ProPublica. Jun. 25, 2019.\nhttps://features.propublica.org/aggression-detector/the-unproven-invasive-surveillance-technology-\nschools-are-using-to-monitor-students/\n73.Drew Harwell. Cheating-detection companies made millions during the pandemic. Now students are\nfighting back. Washington Post. Nov. 12, 2020.\nhttps://www.washingtonpost.com/technology/2020/11/12/test-monitoring-student-revolt/\n74. See, e.g., Heather Morrison. Virtual Testing Puts Disabled Students at a Disadvantage. Government\nTechnology. May 24, 2022.\nhttps://www.govtech.com/education/k-12/virtual-testing-puts-disabled-students-at-a-disadvantage;\nLydia X. Z. Brown, Ridhi Shetty, Matt Scherer, and Andrew Crawford. Ableism And Disability\nDiscrimination In New Surveillance Technologies: How new surveillance technologies in education,\npolicing, health care, and the workplace disproportionately harm disabled people . Center for Democracy\nand Technology Report. May 24, 2022.https://cdt.org/insights/ableism-and-disability-discrimination-in-new-surveillance-technologies-how-new-surveillance-technologies-in-education-policing-health-care-and-the-workplace-disproportionately-harm-disabled-people/\n69", |
|
'25 MP-2.3-002 Review and document accuracy, representativeness, relevance, suitability of data \nused at different stages of AI life cycle. Harmful Bias and Homogenization ; \nIntellectual Property \nMP-2.3-003 Deploy and document fact -checking techniques to verify the accuracy and \nveracity of information generated by GAI systems, especially when the \ninformation comes from multiple (or unknown) sources. Information Integrity \nMP-2.3-004 Develop and implement testing techniques to identify GAI produced content (e.g., synthetic media) that might be indistinguishable from human -generated content. Information Integrity \nMP-2.3-005 Implement plans for GAI systems to undergo regular adversarial testing to identify \nvulnerabilities and potential manipulation or misuse. Information Security \nAI Actor Tasks: AI Development, Domain Experts, TEVV \n \nMAP 3.4: Processes for operator and practitioner proficiency with AI system performance and trustworthiness – and relevant \ntechnical standards and certifications – are defined, assessed, and documented. \nAction ID Suggested Action GAI Risks \nMP-3.4-001 Evaluate whether GAI operators and end -users can accurately understand \ncontent lineage and origin. Human -AI Configuration ; \nInformation Integrity \nMP-3.4-002 Adapt existing training programs to include modules on digital content \ntransparency. Information Integrity \nMP-3.4-003 Develop certification programs that test proficiency in managing GAI risks and \ninterpreting content provenance, relevant to specific industry and context. Information Integrity \nMP-3.4-004 Delineate human proficiency tests from tests of GAI capabilities. Human -AI Configuration \nMP-3.4-005 Implement systems to continually monitor and track the outcomes of human- GAI \nconfigurations for future refinement and improvements . Human -AI Configuration ; \nInformation Integrity \nMP-3.4-006 Involve the end -users, practitioners, and operators in GAI system in prototyping \nand testing activities. Make sure these tests cover various scenarios , such as crisis \nsituations or ethically sensitive contexts. Human -AI Configuration ; \nInformation Integrity ; Harmful Bias \nand Homogenization ; Dangerous , \nViolent, or Hateful Content \nAI Actor Tasks: AI Design, AI Development, Domain Experts, End -Users, Human Factors, Operation and Monitoring', |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 1024] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Information Retrieval |
|
|
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.7188 | |
|
| cosine_accuracy@3 | 0.9219 | |
|
| cosine_accuracy@5 | 0.9688 | |
|
| cosine_accuracy@10 | 1.0 | |
|
| cosine_precision@1 | 0.7188 | |
|
| cosine_precision@3 | 0.3073 | |
|
| cosine_precision@5 | 0.1937 | |
|
| cosine_precision@10 | 0.1 | |
|
| cosine_recall@1 | 0.7188 | |
|
| cosine_recall@3 | 0.9219 | |
|
| cosine_recall@5 | 0.9688 | |
|
| cosine_recall@10 | 1.0 | |
|
| cosine_ndcg@10 | 0.8728 | |
|
| cosine_mrr@10 | 0.8305 | |
|
| cosine_map@100 | 0.8305 | |
|
| dot_accuracy@1 | 0.7344 | |
|
| dot_accuracy@3 | 0.9219 | |
|
| dot_accuracy@5 | 0.9688 | |
|
| dot_accuracy@10 | 1.0 | |
|
| dot_precision@1 | 0.7344 | |
|
| dot_precision@3 | 0.3073 | |
|
| dot_precision@5 | 0.1937 | |
|
| dot_precision@10 | 0.1 | |
|
| dot_recall@1 | 0.7344 | |
|
| dot_recall@3 | 0.9219 | |
|
| dot_recall@5 | 0.9688 | |
|
| dot_recall@10 | 1.0 | |
|
| dot_ndcg@10 | 0.8785 | |
|
| dot_mrr@10 | 0.8383 | |
|
| **dot_map@100** | **0.8383** | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
|
|
* Size: 586 training samples |
|
* Columns: <code>sentence_0</code> and <code>sentence_1</code> |
|
* Approximate statistics based on the first 586 samples: |
|
| | sentence_0 | sentence_1 | |
|
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| |
|
| type | string | string | |
|
| details | <ul><li>min: 20 tokens</li><li>mean: 35.95 tokens</li><li>max: 60 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 545.8 tokens</li><li>max: 1018 tokens</li></ul> | |
|
* Samples: |
|
| sentence_0 | sentence_1 | |
|
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
| <code>What are the primary objectives outlined in the "Blueprint for an AI Bill of Rights" as it pertains to the American people?</code> | <code>BLUEPRINT FOR AN <br>AI B ILL OF <br>RIGHTS <br>MAKING AUTOMATED <br>SYSTEMS WORK FOR <br>THE AMERICAN PEOPLE <br>OCTOBER 2022</code> | |
|
| <code>In what ways does the document propose to ensure that automated systems are designed and implemented to benefit society?</code> | <code>BLUEPRINT FOR AN <br>AI B ILL OF <br>RIGHTS <br>MAKING AUTOMATED <br>SYSTEMS WORK FOR <br>THE AMERICAN PEOPLE <br>OCTOBER 2022</code> | |
|
| <code>What is the primary purpose of the Blueprint for an AI Bill of Rights as published by the White House Office of Science and Technology Policy in October 2022?</code> | <code>About this Document <br>The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was <br>published by the White House Office of Science and Technology Policy in October 2022. This framework was <br>released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered <br>world.” Its release follows a year of public engagement to inform this initiative. The framework is available <br>online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights <br>About the Office of Science and Technology Policy <br>The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology <br>Policy, Organization, and Priorities Act of 1976 to provide the President and others within the Executive Office <br>of the President with advice on the scientific, engineering, and technological aspects of the economy, national <br>security, health, foreign relations, the environment, and the technological recovery and use of resources, among <br>other topics. OSTP leads interagency science and technology policy coordination efforts, assists the Office of <br>Management and Budget (OMB) with an annual review and analysis of Federal research and development in <br>budgets, and serves as a source of scientific and technological analysis and judgment for the President with <br>respect to major policies, plans, and programs of the Federal Government. <br>Legal Disclaimer <br>The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People is a white paper <br>published by the White House Office of Science and Technology Policy. It is intended to support the <br>development of policies and practices that protect civil rights and promote democratic values in the building, <br>deployment, and governance of automated systems. <br>The Blueprint for an AI Bill of Rights is non-binding and does not constitute U.S. government policy. It <br>does not supersede, modify, or direct an interpretation of any existing statute, regulation, policy, or <br>international instrument. It does not constitute binding guidance for the public or Federal agencies and <br>therefore does not require compliance with the principles described herein. It also is not determinative of what <br>the U.S. government’s position will be in any international negotiation. Adoption of these principles may not <br>meet the requirements of existing statutes, regulations, policies, or international instruments, or the <br>requirements of the Federal agencies that enforce them. These principles are not intended to, and do not, <br>prohibit or limit any lawful activity of a government agency, including law enforcement, national security, or <br>intelligence activities. <br>The appropriate application of the principles set forth in this white paper depends significantly on the <br>context in which automated systems are being utilized. In some circumstances, application of these principles <br>in whole or in part may not be appropriate given the intended use of automated systems to achieve government <br>agency missions. Future sector-specific guidance will likely be necessary and important for guiding the use of <br>automated systems in certain settings such as AI systems used as part of school building security or automated <br>health diagnostic systems. <br>The Blueprint for an AI Bill of Rights recognizes that law enforcement activities require a balancing of <br>equities, for example, between the protection of sensitive law enforcement information and the principle of <br>notice; as such, notice may not be appropriate, or may need to be adjusted to protect sources, methods, and <br>other law enforcement equities. Even in contexts where these principles may not apply in whole or in part, <br>federal departments and agencies remain subject to judicial, privacy, and civil liberties oversight as well as <br>existing policies and safeguards that govern automated systems, including, for example, Executive Order 13960, <br>Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government (December 2020). <br>This white paper recognizes that national security (which includes certain law enforcement and <br>homeland security activities) and defense activities are of increased sensitivity and interest to our nation’s <br>adversaries and are often subject to special requirements, such as those governing classified information and <br>other protected data. Such activities require alternative, compatible safeguards through existing policies that <br>govern automated systems and AI, such as the Department of Defense (DOD) AI Ethical Principles and <br>Responsible AI Implementation Pathway and the Intelligence Community (IC) AI Ethics Principles and <br>Framework. The implementation of these policies to national security and defense activities can be informed by <br>the Blueprint for an AI Bill of Rights where feasible.</code> | |
|
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: |
|
```json |
|
{ |
|
"scale": 20.0, |
|
"similarity_fct": "cos_sim" |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: steps |
|
- `per_device_train_batch_size`: 5 |
|
- `per_device_eval_batch_size`: 5 |
|
- `num_train_epochs`: 2 |
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: steps |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 5 |
|
- `per_device_eval_batch_size`: 5 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 1 |
|
- `eval_accumulation_steps`: None |
|
- `torch_empty_cache_steps`: None |
|
- `learning_rate`: 5e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1 |
|
- `num_train_epochs`: 2 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: linear |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.0 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: False |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: None |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: False |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: False |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `eval_on_start`: False |
|
- `eval_use_gather_object`: False |
|
- `batch_sampler`: batch_sampler |
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | dot_map@100 | |
|
|:------:|:----:|:-----------:| |
|
| 0.4237 | 50 | 0.8383 | |
|
|
|
|
|
### Framework Versions |
|
- Python: 3.10.12 |
|
- Sentence Transformers: 3.1.1 |
|
- Transformers: 4.44.2 |
|
- PyTorch: 2.4.1+cu121 |
|
- Accelerate: 0.34.2 |
|
- Datasets: 3.0.1 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### MultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{henderson2017efficient, |
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
year={2017}, |
|
eprint={1705.00652}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |