How to Apply the Fine-Tuned Transformer Models

Introduction

In this short tutorials, we show how to apply the fine-tuned models presented and validated in Severin et al. (2023) to sentences of peer reviews. We provide code for the classification using Python and R (in combination with the reticulate package).

Note: we used function corpus_segment() of the quanteda package to separate the full reviews to the level of sentences.

We provide a CSV file sentences_example.csv (included in the Zenodo repository) which contains 20 invented sentences from an imaginary review. All transformer models are included in separate folders in the Zenodo repository of the paper.

Predict Sentences using R and reticulate

# load required packages
library(tidyverse)
library(reticulate) # for connection to Python
library(here)
library(kableExtra)
# read in data
data <- read.csv(here("sentences_example.csv"),
                 fileEncoding = "utf-8")

# get transformer via reticulate
transformer <- reticulate::import('transformers')

# use fine-tuned model
fine_tuned_categories <- c("criticism",
                           "example",
                           "importance_and_relevance",
                           "materials_and_methods",
                           "praise",
                           "presentation_and_reporting",
                           "results_and_discussion",
                           "suggestion_and_solution")


# get fine-tuned distilbert models for all categories
fine_tuned_models <- paste0("distilbert_models/",
                            fine_tuned_categories)

# specify the tokenizer needed
distilbert_tokenizer <- 'distilbert-base-uncased'

# loop through models and get predictions
distilbert_preds <- lapply(fine_tuned_models, function(model_idx) {
    # print current model
    print(paste0("Current Model: ", model_idx))
    
    # get the corresponding tokenizer
    tokenizer <- transformer$AutoTokenizer$from_pretrained(distilbert_tokenizer)
    
    # specify the locally saved model
    distilbert_model <- here(model_idx)
    
    # get the fine-tuned model
    model <- (transformer$AutoModelForSequenceClassification$from_pretrained(
        distilbert_model, num_labels = 2))
    
    # create pipeline for classification (label_0/label_1)
    pipeline <- transformer$TextClassificationPipeline(model = model,
                                                       tokenizer = tokenizer,
                                                       top_k = 1)
    
    # shorten model ID
    model_idx_short <- gsub("distilbert_models/", "", model_idx)
    # get predictions
    distilbert_preds <- pipeline(data$sentence, top_k = 1) |>
        bind_rows() |>
        mutate(label = case_when((label == "LABEL_0") ~ "no",
                                 (label == "LABEL_1") ~ "yes")) |>
        rename(!!paste0("label_", model_idx_short) := label,
               !!paste0("score_", model_idx_short) := score)
    
})
[1] "Current Model: distilbert_models/criticism"
[1] "Current Model: distilbert_models/example"
[1] "Current Model: distilbert_models/importance_and_relevance"
[1] "Current Model: distilbert_models/materials_and_methods"
[1] "Current Model: distilbert_models/praise"
[1] "Current Model: distilbert_models/presentation_and_reporting"
[1] "Current Model: distilbert_models/results_and_discussion"
[1] "Current Model: distilbert_models/suggestion_and_solution"
# combine with previous results
data <- data |>
    bind_cols(distilbert_preds)
# get variable names of all label categories
names_labels <- paste0("label_", fine_tuned_categories)

# create well-formatted tables showing sentences
# and predictions for each category
for (i in names_labels) {
    
    # select one category at a time
    data_category <- select(data, sentence, !!i)
    
    # create and print table in html format
    kable(data_category,
          table.attr = "style = \"color: black;\"") |> 
        kable_styling("striped", full_width = T) |>
        column_spec(1, width_min = '5in') |> 
        column_spec(2, width_min = '3.5in') |>
        print()
    
}
sentence label_criticism
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. no
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. no
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. no
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. no
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. no
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. no
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. no
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. no
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. no
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. no
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. no
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. no
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. no
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. no
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. no
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. no
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. no
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. no
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. no
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. no
sentence label_example
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. no
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. no
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. no
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. no
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. no
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. no
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. no
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. no
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. no
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. no
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. no
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. no
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. no
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. no
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. no
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. no
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. no
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. no
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. no
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. no
sentence label_importance_and_relevance
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. yes
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. no
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. no
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. yes
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. no
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. yes
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. no
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. no
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. no
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. yes
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. yes
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. no
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. yes
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. yes
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. no
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. yes
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. no
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. no
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. no
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. yes
sentence label_materials_and_methods
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. yes
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. yes
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. yes
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. no
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. yes
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. no
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. yes
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. no
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. yes
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. no
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. no
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. yes
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. yes
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. yes
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. yes
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. no
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. yes
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. yes
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. no
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. no
sentence label_praise
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. yes
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. yes
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. no
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. yes
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. yes
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. no
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. yes
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. no
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. yes
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. yes
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. no
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. no
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. yes
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. yes
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. no
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. yes
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. yes
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. no
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. yes
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. yes
sentence label_presentation_and_reporting
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. no
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. no
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. no
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. no
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. yes
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. yes
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. no
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. yes
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. no
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. no
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. yes
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. no
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. no
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. no
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. no
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. no
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. no
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. yes
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. yes
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. no
sentence label_results_and_discussion
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. no
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. no
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. no
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. no
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. yes
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. no
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. yes
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. yes
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. no
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. yes
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. no
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. yes
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. no
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. no
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. yes
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. yes
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. no
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. no
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. no
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. no
sentence label_suggestion_and_solution
The paper presents a comprehensive review of the latest clinical trials investigating the efficacy of a novel drug in treating a specific type of cancer, providing a valuable resource for researchers and clinicians in the field. no
The authors are to be commended for their meticulous selection of studies and rigorous evaluation of the methodology employed in each trial. no
However, one suggestion for improvement would be to include more specific examples of patient demographics and treatment protocols for a clearer understanding of the interventions. yes
The paper effectively highlights the relevance of the topic by discussing the increasing incidence of the targeted cancer and the limited treatment options currently available. no
The methods section is well-detailed, outlining the inclusion and exclusion criteria, outcome measures, and statistical analyses performed. no
It would be beneficial for the authors to provide more information about the potential side effects or adverse events associated with the novel drug to further evaluate its safety profile. yes
The authors demonstrate a critical evaluation of the limitations of the included trials, including potential biases and confounding factors, which adds credibility to the analysis. no
The paper could benefit from a clearer presentation of the overall results, including summary tables or figures to aid in the interpretation of the findings. yes
The statistical analysis performed by the authors reveals a significant improvement in overall survival rates for patients receiving the novel drug compared to standard treatment. no
The paper’s discussion section effectively contextualizes the results within the current treatment landscape, discussing the potential implications for clinical practice and future research directions. no
One suggestion for improvement would be to include a brief description of the mechanism of action of the novel drug to enhance the readers’ understanding of its therapeutic potential. yes
The authors’ inclusion of a sensitivity analysis to assess the robustness of the results further strengthens the reliability of their findings. no
The paper successfully addresses a significant research gap by evaluating the efficacy of the novel drug in a specific patient population that has been understudied in previous trials. no
The authors’ clear and concise reporting of the study design and patient characteristics allows for easy replication of the research in future investigations. no
However, it would be valuable for the authors to discuss potential limitations related to the generalizability of the findings, considering the specific patient population included in the trials. yes
The paper’s conclusion provides a succinct summary of the study’s main findings and underscores the potential of the novel drug to revolutionize the treatment of the targeted cancer. no
The authors are praised for their comprehensive search strategy, including multiple databases and manual screening of reference lists, ensuring a thorough inclusion of relevant studies. no
However, it would be beneficial for the authors to provide more information about the quality assessment tools used to evaluate the included trials’ risk of bias. yes
The paper’s logical structure and clear subheadings facilitate an organized and easy-to-follow reading experience. no
Overall, this paper significantly contributes to the field, offering compelling evidence for the efficacy of the novel drug and emphasizing the need for further research to optimize its use in clinical practice. no
# get information on session info
sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kableExtra_1.3.4 here_1.0.1       reticulate_1.28  lubridate_1.9.2 
 [5] forcats_1.0.0    stringr_1.5.0    dplyr_1.1.2      purrr_1.0.1     
 [9] readr_2.1.4      tidyr_1.3.0      tibble_3.2.1     ggplot2_3.4.2   
[13] tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  xfun_0.39         lattice_0.20-45   colorspace_2.1-0 
 [5] vctrs_0.6.2       generics_0.1.3    htmltools_0.5.5   viridisLite_0.4.2
 [9] yaml_2.3.7        utf8_1.2.3        rlang_1.1.1       pillar_1.9.0     
[13] glue_1.6.2        withr_2.5.0       lifecycle_1.0.3   munsell_0.5.0    
[17] gtable_0.3.3      rvest_1.0.3       htmlwidgets_1.6.1 evaluate_0.21    
[21] knitr_1.43        tzdb_0.4.0        fastmap_1.1.1     fansi_1.0.4      
[25] highr_0.10        Rcpp_1.0.10       scales_1.2.1      webshot_0.5.4    
[29] jsonlite_1.8.4    systemfonts_1.0.4 hms_1.1.3         png_0.1-8        
[33] digest_0.6.31     stringi_1.7.12    grid_4.2.3        rprojroot_2.0.3  
[37] cli_3.6.1         tools_4.2.3       magrittr_2.0.3    pkgconfig_2.0.3  
[41] Matrix_1.5-3      xml2_1.3.4        timechange_0.2.0  rmarkdown_2.21   
[45] svglite_2.1.1     httr_1.4.6        rstudioapi_0.14   R6_2.5.1         
[49] compiler_4.2.3   
cat("Code executed successfully on", date())
Code executed successfully on Mon Jun 26 10:57:56 2023

Predict Sentences using Python

Of course, users can also use Python to apply one or more categories to review sentences. Below, we provide a minimal working example using the transformers library.

## load general libraries
import pandas as pd

## load relevant functions from transformers library
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline, pipeline

## print version of libraries for reproducibility
from importlib.metadata import version

version('transformers')
version('pandas')

## load tokenizer (same for all models handled here)
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')

## specify models to be applied in a list using list comprehension
jif_models = [model_name for model_name in ["example", "criticism", "suggestion_and_solution"]]

## three example sentences / replace with sample data in a list format
text_data = ["The introduction effectively sets the context and research objectives, but could benefit from further elaboration on the specific research questions.",
             "The methodology is robust and well-designed, incorporating a combination of field observations and data analysis, enhancing the reliability of the findings.",
             "However, additional details on the sampling techniques and statistical analyses used would strengthen the methodology section."]

## setup an empty dictionary for results storage
results = dict()

## set up a loop through jif models
for model_idx in jif_models:
    # load the model
    current_model = AutoModelForSequenceClassification.from_pretrained(
        "distilbert_models/" + model_idx, num_labels = 2)
    # run classification pipeline
    classifier = pipeline("text-classification",
                          model = current_model,
                          tokenizer = tokenizer)
    # get sentence predictions and store them into a dictionary
    results[model_idx] = pd.DataFrame(classifier(text_data))
    # recode string labels as numerics for convenience
    results[model_idx].label = (results[model_idx].label == "LABEL_1") * 1
    # export the results into csv
    results[model_idx].to_csv("data_results" + model_idx + ".csv",
                              index = False)