optimizing hyperparameters for blueCarbon model

Browse files

Files changed (8) hide show

1_Pooling/config.json +2 -1
README.md +28 -133
config.json +1 -1
config_sentence_transformers.json +3 -1
config_setfit.json +2 -2
model.safetensors +1 -1
model_head.pkl +2 -2
special_tokens_map.json +2 -2

1_Pooling/config.json CHANGED Viewed

@@ -5,5 +5,6 @@
   "pooling_mode_max_tokens": false,
   "pooling_mode_mean_sqrt_len_tokens": false,
   "pooling_mode_weightedmean_tokens": false,
-  "pooling_mode_lasttoken": false
 }

   "pooling_mode_max_tokens": false,
   "pooling_mode_mean_sqrt_len_tokens": false,
   "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
 }

README.md CHANGED Viewed

@@ -7,112 +7,29 @@ tags:
 - generated_from_setfit_trainer
 metrics:
 - accuracy
-widget:
-- text: interest in third generation biomass such as macroalgae has increased due
-    to their high biomass yield, absence of lignin in their tissues, lower competition
-    for land and fresh water, no fertilization requirements, and efficient co2 capture
-    in coastal ecosystems. however, several challenges still exist in the development
-    of cost effective technologies for processing large amounts of macroalgae. recently,
-    genetically modified micro organisms able to convert brown macroalgae carbohydrates
-    into bioethanol were developed, but still no attempt to scale up production has
-    been proposed. based on giant kelp farming and bioethanol production program carried
-    out in chile, we were able to test and adapt this technology as first attempt
-    to scale up this process using 75 fermentation of genetically modified escherichia
-    coli. laboratory fermentation tests results showed that although biomass growth
-    and yield are not greatly affected by the alginate mannitol ratio, ethanol yield
-    showed clear maximum around alginate mannitol ratio. in . pyrifera, much greater
-    proportion of alginate and lower mannitol abundance is found. in order to make
-    the most of the carbohydrates available for fermentation, we developed four stage
-    process model for scaling up, including acid leaching, depolymerization, saccharification,
-    and fermentation steps. using this process, we obtained .213 kg ethanol kg dry
-    macroalgae, equivalent to . of ethanol hectare year, reaching 64 of the maximum
-    theoretical ethanol yield. we propose strategies to increase this yield, including
-    synthetic biology pathway engineering approaches and process optimization targets.
-    2016 society of chemical industry and john wiley sons, ltd
-- text: producing concrete that incorporates carbon dioxide into the mix is leveraged
-    to reduce the carbon footprint and produce more sustainable concrete. as the concrete
-    dries, the co2 is mineralized and permanently incorporated into the early carbonation.
-    experimental work has been conducted, and hundreds of specimens with varying ratios
-    of co2 to binder content were cast. co2 to binder ratios of were used to test
-    concrete in workability , mechanical properties , and durability performance .
-    the chemical tests were also conducted to identify the changes in hardened concrete
-    composition for the three mixes . all specimens were field cured and exposed to
-    the coastal environment of ras al khair industrial city in saudi arabia. the results
-    showed that the co2 to binder ratio of . improved the concrete properties, in
-    particular, the effect was clear with higher slump and comparable strength compared
-    to the standard concrete without co2. however, the co2 to binder ratio of . shows
-    negligible increase in the chloride permeability and the internal chloride ion
-    content compared to the standard concrete without co2, whereas the internal sulfate
-    ion content has not increased for both co2 to binder ratios in comparison with
-    the standard concrete without co2, which indicate no reduction in concrete durability.
-    2023 isec press.
-- text: mangroves are ecosystems made up of trees or shrubs that develop in the intertidal
-    zone and provide many vital environmental services for livelihoods in coastal
-    areas. they are habitat for the reproduction of several marine species. they afford
-    protection from hurricanes, tides, sea level rise and prevent the erosion of the
-    coasts. just one hectare of mangrove forest can hold up to ,000 tons of carbon
-    dioxide, more than tropical forests and jungles. mexico is one of the countries
-    with the greatest abundance of mangroves in the world, with more than 700,000
-    ha. blue carbon can be novel mechanism for promoting communication and cooperation
-    between the investor, the government, the users, and beneficiaries of the environmental
-    services of these ecosystems, creating public private social partnerships through
-    mechanisms such as payment for environmental services, credits, or the voluntary
-    carbon market. this chapter explores the possibilities of incorporating blue carbon
-    in emissions markets. we explore the huge potential of mexico blue carbon to sequester
-    co2. then we analyse the new market instrument that allows countries to sell or
-    transfer mitigation results internationally the sustainable development mechanism
-    , established in the paris agreement. secondly, we present the progress of the
-    commission for environmental cooperation to standardize the methodologies to assess
-    their stock and determine the magnitude of the blue carbon sinks. thirdly, as
-    an opportunity for mexico, the collaboration with the california cap and trade
-    program is analysed. we conclude that blue carbon is very important mitigation
-    tool to be included in the compensation schemes on regional and global levels.
-    additionally, mangrove protection is an excellent example of the mitigation adaptation
-    sustainable development relationship, as well as fostering of governance by the
-    inclusion of the coastal communities in decision making and incomes. 2022, the
-    author.
-- text: featured application the findings obtained from this study have implications
-    for global blue carbon budgeting. abstract field monitoring and incubation experiments
-    were conducted to evaluate the litter yield and examine the decomposition of the
-    litter of three representative mangrove species frequently used for mangrove re
-    vegetation in subtropical mudflat on the south china coast. the results show that
-    the litter yield of the investigated mangrove species varied significantly from
-    season to season. the annual litter production was in the following decreasing
-    order heritiera littoralis thespesia populnea kandelia obovata. initially, rapid
-    decomposition of easily degradable components of the litter materials resulted
-    in marked weight loss of the mangrove litter. there was good linear relationship
-    between the length of field incubation time and the litter decomposition rate
-    for both the branch and the leaf portion of the three investigated mangrove species.
-    approximately 50 or more of the added mangrove litter could be decomposed within
-    one year and the decomposed litter could be incorporated into the underlying soils
-    and consequently affect the soil carbon dynamics. an annual soil carbon increase
-    from .37 to .64 kg in the top cm of the soil was recorded for the investigated
-    mangrove species.
-- text: seagrasses provide multitude of ecosystem services and serve as important
-    organic carbon stores. however, seagrass habitats are declining worldwide, threatened
-    by global climate change and regional shifts in water quality. acoustical methods
-    have been applied to assess changes in oxygen production of seagrass meadows since
-    sound propagation is sensitive to the presence of bubbles, which exist both within
-    the plant tissue and freely floating the water as byproducts of photosynthesis.
-    this work applies acoustic remote sensing techniques to characterize two different
-    regions of seagrass meadow densely vegetated meadow of thalassia testudinum and
-    sandy region sparsely populated by isolated stands of . testudinum. bayesian approach
-    is applied to estimate the posterior probability distributions of the unknown
-    model parameters. the sensitivity of sound to the void fraction of gas present
-    in the seagrass meadow was established by the narrow marginal probability distributions
-    that provided distinct estimates of the void fraction between the two sites. the
-    absolute values of the estimated void fractions are biased by limitations in the
-    forward model, which does not capture the full complexity of the seagrass environment.
-    nevertheless, the results demonstrate the potential use of acoustical methods
-    to remotely sense seagrass health and density.
 pipeline_tag: text-classification
 inference: false
 base_model: sentence-transformers/paraphrase-mpnet-base-v2
 ---
 # SetFit with sentence-transformers/paraphrase-mpnet-base-v2
-This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) as the Sentence Transformer embedding model. A MultiOutputClassifier instance is used for classification.
 The model has been trained using an efficient few-shot learning technique that involves:
@@ -124,7 +41,7 @@ The model has been trained using an efficient few-shot learning technique that i
 ### Model Description
 - **Model Type:** SetFit
 - **Sentence Transformer body:** [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2)
-- **Classification head:** a MultiOutputClassifier instance
 - **Maximum Sequence Length:** 512 tokens
 <!-- - **Number of Classes:** Unknown -->
 <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
@@ -137,6 +54,13 @@ The model has been trained using an efficient few-shot learning technique that i
 - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
 - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
 ## Uses
 ### Direct Use for Inference
@@ -155,7 +79,7 @@ from setfit import SetFitModel
 # Download from the 🤗 Hub
 model = SetFitModel.from_pretrained("ignaciosg/blueCarbon")
 # Run inference
-preds = model("featured application the findings obtained from this study have implications for global blue carbon budgeting. abstract field monitoring and incubation experiments were conducted to evaluate the litter yield and examine the decomposition of the litter of three representative mangrove species frequently used for mangrove re vegetation in subtropical mudflat on the south china coast. the results show that the litter yield of the investigated mangrove species varied significantly from season to season. the annual litter production was in the following decreasing order heritiera littoralis thespesia populnea kandelia obovata. initially, rapid decomposition of easily degradable components of the litter materials resulted in marked weight loss of the mangrove litter. there was good linear relationship between the length of field incubation time and the litter decomposition rate for both the branch and the leaf portion of the three investigated mangrove species. approximately 50 or more of the added mangrove litter could be decomposed within one year and the decomposed litter could be incorporated into the underlying soils and consequently affect the soil carbon dynamics. an annual soil carbon increase from .37 to .64 kg in the top cm of the soil was recorded for the investigated mangrove species.")
 ```
 <!--
@@ -184,42 +108,13 @@ preds = model("featured application the findings obtained from this study have i
 ## Training Details
-### Training Set Metrics
-| Training set | Min | Median   | Max |
-|:-------------|:----|:---------|:----|
-| Word count   | 80  | 236.0127 | 453 |
-### Training Hyperparameters
-- batch_size: (1, 1)
-- num_epochs: (1, 1)
-- max_steps: 1
-- sampling_strategy: oversampling
-- num_iterations: 1
-- body_learning_rate: (2e-05, 1e-05)
-- head_learning_rate: 0.01
-- loss: CosineSimilarityLoss
-- distance_metric: cosine_distance
-- margin: 0.25
-- end_to_end: False
-- use_amp: False
-- warmup_proportion: 0.1
-- max_length: 750
-- seed: 42
-- eval_max_steps: 1
-- load_best_model_at_end: False
-### Training Results
-| Epoch  | Step | Training Loss | Validation Loss |
-|:------:|:----:|:-------------:|:---------------:|
-| 0.0001 | 1    | 0.2289        | -               |
 ### Framework Versions
 - Python: 3.10.12
 - SetFit: 1.0.3
-- Sentence Transformers: 2.3.1
-- Transformers: 4.35.2
 - PyTorch: 2.1.0+cu121
-- Datasets: 2.17.1
 - Tokenizers: 0.15.2
 ## Citation

 - generated_from_setfit_trainer
 metrics:
 - accuracy
+widget: []
 pipeline_tag: text-classification
 inference: false
 base_model: sentence-transformers/paraphrase-mpnet-base-v2
+model-index:
+- name: SetFit with sentence-transformers/paraphrase-mpnet-base-v2
+  results:
+  - task:
+      type: text-classification
+      name: Text Classification
+    dataset:
+      name: Unknown
+      type: unknown
+      split: test
+    metrics:
+    - type: accuracy
+      value: 0.1502397442727757
+      name: Accuracy
 ---
 # SetFit with sentence-transformers/paraphrase-mpnet-base-v2
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) as the Sentence Transformer embedding model. A OneVsRestClassifier instance is used for classification.
 The model has been trained using an efficient few-shot learning technique that involves:
 ### Model Description
 - **Model Type:** SetFit
 - **Sentence Transformer body:** [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2)
+- **Classification head:** a OneVsRestClassifier instance
 - **Maximum Sequence Length:** 512 tokens
 <!-- - **Number of Classes:** Unknown -->
 <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
 - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
 - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+## Evaluation
+### Metrics
+| Label   | Accuracy |
+|:--------|:---------|
+| **all** | 0.1502   |
 ## Uses
 ### Direct Use for Inference
 # Download from the 🤗 Hub
 model = SetFitModel.from_pretrained("ignaciosg/blueCarbon")
 # Run inference
+preds = model("I loved the spiderman movie!")
 ```
 <!--
 ## Training Details
 ### Framework Versions
 - Python: 3.10.12
 - SetFit: 1.0.3
+- Sentence Transformers: 2.5.1
+- Transformers: 4.38.1
 - PyTorch: 2.1.0+cu121
+- Datasets: 2.18.0
 - Tokenizers: 0.15.2
 ## Citation

config.json CHANGED Viewed

@@ -19,6 +19,6 @@
   "pad_token_id": 1,
   "relative_attention_num_buckets": 32,
   "torch_dtype": "float32",
-  "transformers_version": "4.35.2",
   "vocab_size": 30527
 }

   "pad_token_id": 1,
   "relative_attention_num_buckets": 32,
   "torch_dtype": "float32",
+  "transformers_version": "4.38.1",
   "vocab_size": 30527
 }

config_sentence_transformers.json CHANGED Viewed

@@ -3,5 +3,7 @@
     "sentence_transformers": "2.0.0",
     "transformers": "4.7.0",
     "pytorch": "1.9.0+cu102"
-  }
 }

     "sentence_transformers": "2.0.0",
     "transformers": "4.7.0",
     "pytorch": "1.9.0+cu102"
+  },
+  "prompts": {},
+  "default_prompt_name": null
 }

config_setfit.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
-  "normalize_embeddings": false,
-  "labels": null
 }

 {
+  "labels": null,
+  "normalize_embeddings": false
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5f839cf4fdde8eff477d7f56a42186948f5e236e0c5350b9b8685d7f810b8813
 size 437967672

 version https://git-lfs.github.com/spec/v1
+oid sha256:8cb0abee1b3ccaf4776107b24d1b77598c93bc13a12d9e6dea65fe9b1657c963
 size 437967672

model_head.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:115e3cd3f7169d7400758edf60c67d23df246656b17016193fc2044daaefe498
-size 1460265

 version https://git-lfs.github.com/spec/v1
+oid sha256:90f1d4020f711d7b81d1c955d4e7c42e73ded0c0c3ffc9a679832d8e9e4205bf
+size 195396

special_tokens_map.json CHANGED Viewed

@@ -9,7 +9,7 @@
   "cls_token": {
     "content": "<s>",
     "lstrip": false,
-    "normalized": true,
     "rstrip": false,
     "single_word": false
   },
@@ -37,7 +37,7 @@
   "sep_token": {
     "content": "</s>",
     "lstrip": false,
-    "normalized": true,
     "rstrip": false,
     "single_word": false
   },

   "cls_token": {
     "content": "<s>",
     "lstrip": false,
+    "normalized": false,
     "rstrip": false,
     "single_word": false
   },
   "sep_token": {
     "content": "</s>",
     "lstrip": false,
+    "normalized": false,
     "rstrip": false,
     "single_word": false
   },