A Theoretical, Empirical, and Methodologically Based Instrument to Assess the Risk of Violent Jihadist Radicalization in Prisons: The DRAVY-3

ABSTRACT A main goal of the Spanish National Counter-Terrorism Strategy is to improve the detection and control of inmates who may participate or collaborate with terrorist groups after release convicted and detained inmates linked to terrorist acts, as well as of those individuals involved, during their stay in prison, in violent extremist recruitment or indoctrination. This manuscript introduces an instrument for assessing the risk of violent jihadist radicalization in prisons, the Detention of Violent Jihadists Radicalization (DRAVY-3 for its Spanish initials). This instrument was built on tools already existing, a review of the literature, the experience of the prison staff, indicators suggested by researchers from two institutions, a field study conducted with Muslim inmates (jihadists and non-jihadists), and the results of six implementations of two preliminary versions. The DRAVY-3 was tested by evaluating 570 inmates from five groups (related to jihadist terrorism and controls). The analyses showed that the indicators are distributed into three scales: violence in general, violence of jihadist aetiology, and radicalism. Analyses also informed that a combination of indicators confirm an index of prediction of the level of danger. Results demonstrate the internal strength of the instrument and its capacity to detect potential radicalization leading to violence.


Introduction
Terrorism constitutes one of the greatest threats to the stability of nations and the peaceful coexistence between citizens.Therefore, the States participating in the Organization for Security and Cooperation in Europe (OSCE) signed an Agreement on Preventing and Combating Terrorism in which they condemn terrorism in all its forms and recognize that terrorism requires a coordinated response (see United Nations Security Council Resolution 1373, 2001) because it constitutes a threat to international peace and security.In line with this strategy, various resolutions of the European Parliament attribute special relevance to the fight against international terrorism and the prevention of radicalization and recruitment of citizens by terrorist organisations (2015/2063 INI). 1 Spain, which has been a victim of ethno-separatist terrorism in the past (e.g.ETA) and of jihadist terrorism more recently (e.g. the killing of 193 people at the Madrid Atocha station on March of 2004, or the fifteen murders in Las Ramblas in Catalonia in August of 2017), updated its National Counter-Terrorism Strategy (Estrategia Nacional Contra el Terrorismo or ENCOT) in 2019, stating as a general objective "the neutralisation of the threat represented by terrorism against Spanish citizens and interests within and outside of the borders, reducing the vulnerability of society and confronting the radicalization processes that lead to violent extremism."One of its objectives is "boosting the detection and control of those in prison who may participate in or collaborate with terrorist groups or violent extremist organisations, strengthening the coordination and cooperation between the prison administrations and the security bodies of the State intelligence agencies." Violent extremism of a jihadist nature 2 can expand throughout penitentiary institutions.There is a controversy about considering prisons as an important source of radicalization.For example, although some studies consider that prisons are not so evidently an environment of radicalization, 3 others suggest that this connection does exist, but that it is modest, 4 and many authors even maintain that prison is one of the most relevant contexts of radicalization. 5But there is accordance on the fact that the conditions surrounding the deprivation of liberty may favour radicalization processes, which are influenced by institutional factors and by social and individual aspects, such as overcrowding, group pressure, and the existence of charismatic inmates who exert leadership. 6According to Warnes & Hannah, 7 in the European context, discrimination by society, by the prison staff, and by the rest of the prison population is especially relevant.These factors, combined with the lack of personal identity, create a psychological fragility that makes some inmates vulnerable to radicalization process.
Thus, the governments but also various international agencies have highlighted the importance of the prison context in the fight against this phenomenon, 8 and of the individualised evaluation of the risk of radicalization and terrorism, 9 while protecting the rights of the inmates. 10In Spain, one of the strategic focuses of the ENCOT is to promote and update tools of detection and assessment of the risk of jihadist radicalization leading to violence in the penitentiary setting that evaluate the processes by which some individuals become increasingly motivated to use violence to achieve their ideals. 11In this way, the most drastic extremist conversions could be prevented (especially those that culminate in violence), in addition to detecting those who, once convicted for terrorist activities, are prone to recidivism, 12 in accordance to the principle of behavioural persistence. 13Hence the importance of an individualised assessment of the risk of violence, a practice based on the evidence in various professional spheres, 14 where in the case of Spain the experience in the development of the instrument for police assessment of the risk of recidivism in gender violence against women stands out, 15 and which can be very useful for the creation of assessment instruments for other types of risks.After all, as indicated by Augestad-Knudsen, 16 the more effort countries make to identify adequate strategies to prevent terrorist attacks, the more the tools for individual risk assessment have become pivotal instruments in the fight against terrorism.
In the field of terrorism, risk assessment tools seek to measure, and understand, the extent to which an individual is susceptible to radicalization that could lead, direct or indirectly, to violent actions.These tools indicate risk and protective factors that are crucial for prevention and rehabilitation.For more details about the nature of these instruments and how they work, see for example Scarcella. 17t present, there are various instruments for assessing the risk of radicalization leading to violence in prisons, being the most frequently used the VERA-2 R, the ERG22+, the RRAP and the MLG (see a review in Scarcella et al. or Van der Heide et al.). 18According to Cornwall & Molenkamp, 19 these tools can be useful for (1) the organisation of information (to order and classify relevant information for each case); (2) decision-making (they can help to make decisions regarding the enforcement of sentences), (3) rehabilitation (diagnosing needs or identifying factors that require intervention); (4) the tracking of the individuals (to monitor the cases to see how they evolve); and (5) sharing the risk assessment information with other agents, such as the law enforcement agencies.
However, these tools have also received criticism and suggestions for improvement, such as: (1) insufficient empirical evidence to support their validity; (2) they are susceptible to the biases of the observer; (3) they do not take into consideration the subjective experiences of the individuals who are evaluated (note that less than three percent of scientific papers focused on violent extremism include empirical data); 20  (4) there is insufficient evidence to determine to what extent they can predict future behaviour; and (5) their use requires training and a high number of resources. 21In short, all the instruments lack a broad empirical foundation, and a genuine evaluation of said foundations continues to be necessary. 22Our instrument is intended to improve assessment tools in several ways.First, by including empirical evidence obtained through face-to-face interviews with inmates who have the same profile as those that will be evaluated.Second, developing a strategy to avoid or reduce the biases of the observers.And third, and related to the first point, by supporting the empirical evidence including information obtained by face-to-face interviews with inmates who have the same profile as those that will be evaluated.And fourth, using a sophisticated approach for the data analyses.

The detection of violent Jihadist radicalization tool (DRAVY)
In accordance with the objectives established in the ENCOT, the Spanish Secretary General of Penitentiary Institutions (SGIP, for its Spanish initials) has developed initiatives to detect and prevent processes of recruitment and radicalization of people inside the Penitentiary Centres, mainly Muslims due to the growing threat of jihadist terrorism.Among other measures, Service Instruction 3/2018 developed the first version of an instrument for evaluating the risk of violent radicalism (Detection of Violent Radicalism of Jihadist aetiology, DRAVY-1), based on tools already existing, especially the VERA-2 R. 23 The DRAVY-1 included thirty-nine risk indicators, grouped into two sets: twelve for radicalization leading to violence, and twenty-seven for proselytism, recruitment (see a review in Fernandez). 24The instrument was applied three times (once every six months), considering inmates convicted for jihadist terrorism (group A), or for reasons unrelated to jihadist terrorism but who are suspected of radicalizing others in prisons (group B), or who are particularly vulnerable to recruitment for further radicalization (group C), according to the internal classification of inmates (see Santos-Hermoso et al., 25 for more details).This first version of the instrument had a series of limitations: (1) the indicators were based mainly on the experience of the professionals; (2) the analyses of the information did not include a quantitative rating of the indicators (weighting) nor an algorithmic formula to provide a final score in either of the two scales, leaving it up to the evaluator, with his prior knowledge and experience, to conclude the level of risk of each inmate, without mathematical parameters of reliability or validity; 26 and (3) there were no control group to establish a base-line for comparison.
To overcome these limitations, a second version of the instrument was developed (DRAVY-2), 27 based on the results obtained in the previous analyses and after a review of the literature.The indicators were reformulated and increased up to fifty, grouped into three dimensions to classify the evaluated persons according to the level of (a) general violence they exhibit at any given moment, especially extremist-ideological violence (twenty V indicators); (b) Islamic radicalization, especially proselytising (twenty-one R indicators); and (c) changes in daily habits (nine H indicators).The DRAVY-2 was applied considering members of groups A-B-C (n = 249), and a sample of Muslim inmates not included in these groups (controls, M; n = 283).
The rating scales of violence (V), and radicalism (R) were validated using multiple correspondence analysis (MCA), calculating the weighting of each of the indicators, and establishing three cut-off points for classifying the inmates into four levels (Not detected, Low, Medium, and High), and automating the instrument in a Microsoft Excel spreadsheet that yielded the quantitative results.Another addition of this second version of the instrument was a first proxy to an index that could have predictive capacity and might function as an index of the level of danger of the inmates.To that end, a preliminary index was developed according to the profile of twenty-nine cases selected for their level of danger related to terrorism (e.g., six of them had demonstrated recidivist terrorist actions and nine who had been singled out by the Intelligence Centre for Counterterrorism and Organized Crime [CITCO] as being of interest for monitoring outside of prison).
The analyses for the applications of the DRAVY-2 did not yet implement the indicators of changes in daily habits (H), as it was concluded that the rest of the theoretical indicators (twenty V and twentyone R) presented sufficient empirical support (construct validity) to establish scales that enabled the classification of the subjects on the basis of violence and radicalism; that the scales were reliable (KR20 > 0.6); that the new instrument had sufficient face and content validity; and that the index, which was also predictive (not merely classificatory), also presented adequate parameters of predictive validity for the profile of the level of danger: AUC of 78 percent; sensitivity of 82.8 percent and specificity of 63.8 percent.
The DRAVY-2 presented various advantages as compared to the DRAVY-1: (1) it represents a structured judgment, easier to explain, there is less variability on the evaluations, it is more reliable, and does not depend on the subjectivity of the evaluator, (2) this version includes empirical validation; (3) its indicators were fully adapted to the reality of Spanish prisons, since it is important to consider that each country has its own characteristics when it comes to terrorism; 28  (4) for the first time, the instrument include a preliminary version of an index of the level of danger; and (5) the easiness of its complementation allows for being filled out more often with the correspondent advantages to evaluate the efficacy of the different programs of treatments implemented to these groups of inmates, not just biannually.
Despite all these innovations, and that these two versions of the DRAVY were used for the observation of almost 1500 evaluations of inmates from forty countries from four continents, continuous work has been done to improve the tool, as these types of instruments need to be dynamic and must be subjected to ongoing review processes, remaining open to the possibility that over time, it may be necessary to incorporate new indicators, eliminate some that no longer serve a purpose, or assign them different weighting.
In this way, the usefulness of new indicators to be incorporated in future versions was explored by carrying out an ad hoc internal research by the SGIP in the spring/summer of 2020, on six new indicators proposed by the Community of Intelligence and Security or (CISEG) that could be linked to radicalised individuals or those who are in the process of radicalization because were proposed by experts who were radicalised in the past and who now help combat violent Islamic radicalization.In this research (not published), 497 inmates, 250 belonging to groups A, B and C and 247 "controls" (M) were observed.The results concluded that four of the indicators (Table 1, R22 to R25) could be valid for measuring radicalization, insofar as statistically significant differences were obtained for the subjects of the case group and the control group.
Replicating the methodology followed for the construction and validation of the DRAVY-2, a new study 29 performed in the second half of 2020 (the fifth periodical evaluation), involving 608 inmates (228 A, B and C, and 380 M), confirmed that, once again, the forty-one indicators of the DRAVY (twenty V and twenty-one R) formed the classificatory scales already identified, adding to the robustness of this instrument, without recalibrating the predictive scale on this occasion.Since the nine indicators for "changes in behavioural habits" (H) and the four CISEG indicators were not incorporated into the scales of violence and radicalization, the data from the fifth evaluation was also used to revise the methodology of the construction of the DRAVY-2 scales, making progress in (a) the analysis of the new indicators; (b) the multiple imputation of the values "Not known"; and (c) the calculation of the weightings of the indicators via Factor Analysis (FA) drawing on multidimensional Item Response Theory (IRT).Thanks to these calculations, it was confirmed that the four new CISEG indicators could be incorporated into a scale of radicalization, making this the exclusive scale and doing away with the scale of vulnerability to recruitment.Furthermore, owing to their response format, the nine H indicators were not incorporated into any of the scales, as the experts of the SGIP considered that they should remain in the DRAVY for descriptive purposes (i.e., qualitative interpretation).
Lastly, since 2019 the SGIP has been collaborating with a research group from the Universidad Nacional de Educación a Distancia (UNED), where researchers conducted a series of face-to-face interviews with the goals of preventing radicalization leading to violence and contributing to the development of the risk assessment tool.Following the analysis of the data from a sample of 204 inmates interviewed in thirty-five Spanish penitentiary centres (37 A; 78 B and C; and 89 M), the research team proposed seven new indicators, which could serve to characterise the degree of radicalization of the subjects (e.g., see Gómez et al.). 30These indicators (Table 1, R26 to R32) were supported by previous findings indicating that expressions of admirations toward radical Islamist groups increased willingness to sacrifice for religion through increasing identity fusion-a visceral feeling of oneness with a group or ideology, 31 that feelings of injustice regarding the sentence imposed, hostility toward western culture and society and perceptions of a conflict between east and west were found as factors of radicalization, 32 and that collecting direct expressions of personal visceral feelings of fusion is relevant, because a recent meta-analysis positioned identity fusion as the top risk factor of radicalization. 33To the best of our knowledge, this is the first time that a series of indicators have been incorporated into an instrument of these characteristics following ethnographic field research.Finally, the experts at the SGIP proposed two new indicators, which were the product of the reports made by the staff conducting the monitoring of the inmates (Table 1, R33, 34).

The present study
The present investigation consists of the implementation and test of the most recent version of an instrument to assess the risks of violent jihadist radicalization in Spanish prisons, including the new indicators.The main objective was to determine whether this entire ensemble of indicators would make it possible to construct and validate a third version of the DRAVY that would improve upon the second, reorganizing scales that can determine indicators of violence in its different forms, and radicalization, as well as whether it is also possible to establish an index of dangerousness of inmates.Thus, the specific objectives would be (1) confirming whether the indicators still form significant groups in the classificatory scales or dimensions detected in the previous studies; (2) whether the sets of indicators (or scales) serve to better distinguish the set of inmates in groups A, B and C, from the inmates in the control group (M); (3) whether the scales also adequately differentiate the group A from the groups B and the C; and (4) coming to a decision on the indicators related to changes in the daily routines (H), namely whether these should be definitively incorporated into some kind of scale or are instead still considered descriptive at an individual level and with a qualitative interpretation.

Procedure
A pilot form of the DRAVY-3 was filled in, in April-May 2021 in fifty-six Spanish penitentiary centres, including sixty-three indicators (see Table 1): twenty for violence (V), twenty-one for radicalization (R), nine for changes in habits (H), four CISEG (R22 to R25), seven ethnographic (R26 to R32), and two SGIP (R33 and 34).
The evaluations were filled out by the treatment and security staff at the Penitentiary Centres, meaning that the indicators were registered collectively by the multidisciplinary groups of professionals in accordance with the stipulations of Service Instruction 3/2018.A guide explaining each indicator in detail was developed and distributed.The evaluators were able to rely on the advice of the experts at the SGIP in case of doubt about how to proceed.
Once the individualised evaluations of each inmate were filled out, they were submitted to the SGIP, where they were unified in a database, and the erroneous or missing data was cleaned up (eliminating duplicate evaluations, correcting data on countries of origin and dates of birth and filling in blank fields with the value "Not known").

Sample
The sample included the evaluations of 582 inmates (2.1 percent women) from five groups: those related with jihadists terrorism and that are included in a file of inmates of special follow-up or FIES groups (included in groups A, B or C in accordance with the provisions of Instruction 2/2015 of Penal Institutions), and two control groups, non-FIES (inmates under surveillance, S, who showed incipient signs of radicalization, and Muslim inmates who did not demonstrate any kind of sign of radicalization, M).The selection of the M group was performed randomly in each centre, equating the socio-demographic variables with the other groups.The average age of the inmates was 36.6 years (SD = 10.161;range = 19-75).Regarding their nationality, 24.9 percent were Spanish, 49.8 percent Moroccans, and 11.2 percent Algerian.The distribution of the evaluated inmates was 111 from group A (19.1 percent), forty-four B (7.6 percent), sixty-seven C (11.5 percent), 227 S (39 percent) and 133 M (22.9 percent).
The cases of women (twelve) were dismissed for the analyses due to their low number and the special role played by women in the Islamic community.It would be necessary to develop a separate study on radicalised women that would lead to the construction and validation of a specific instrument.Therefore, the calculations were carried out for the 570 evaluations of the males.

Analysis strategy
For the descriptive analyses, statistics of frequency, central tendency and dispersion were used.For the bivariate analyses, the Chi-Square test of independence was used, observing the Adjusted Corrected Residuals (greater than or equal to 2 or −2) and calculating the size of φ for 2 × 2 crossed variables and Cramer's V for larger samples of crossed variables.In the event that cells with values expected to be lower than five were recorded, Fisher's exact test was used.The investigation worked with a level of significance of .05.
For the construction of the new classificatory scales, multiple imputation of the missing values was carried out, dichotomising the responses to the indicators.In addition, the empirical structure of the groups of indicators was studied via the multidimensional IRT, since this was an adequate approach to the Factor Analysis for dichotomous data.
For the construction of the scale of prediction of the level of danger, the evaluations of thirty-six inmates of the three FIES groups A, B and C, characterised by an elevated level of danger, were compared with the remaining A-B-C (n = 175).This level of danger was determined by the experts of the SGIP, who singled out these male inmates because were under special surveillance for relevant circumstances (e.g., recidivism).The Chi-Square test of independence was applied, and the odds ratio was obtained for the indicators that demonstrated significant differences between the two groups with different level of danger.Once the scale was constructed, the values of sensitivity and specificity were calculated: the positive predictive value (PPV), and the negative predictive value (NPV).The analysis of the area under the curve (AUC) was carried out with the score of the instrument and the dependent variable in a dichotomous format (1 = Danger; 2 = No danger), establishing two cut-off points to determine three levels of risk: not detected, medium and high.

Descriptive analysis of the indicators
The distributions of the responses of all the indicators of the pilot DRAVY-3 can see in Appendix 1 (see supplementary materials), revealing the frequencies and the percentages of the response in terms of the "presence" of the indicator, "absence" and "not known," adding to the indicators of changes in behavioural habits (H) the affirmative responses of "start," "maintenance" and "cessation."In this table, we can observe the missing values and the low frequencies of occurrence of many of the indicators, which justify the need to carry out imputations prior to the factor analyses.To dichotomise the H indicators, it was decided that maintenance would be considered the main category (= 1), contrasted against the rest categories (= 2).

Relevance of the indicators
To check whether the proposed indicators were pertinent for constructing the scales of classification of the instrument, we tested the degree to which they differentiated between the three FIES groups (A-B-C) and the two non-FIES groups or controls (S-M).To this end, a dichotomous variable was created that grouped together the 211 FIES inmates opposite to the 359 non-FIES, and 2 × 2 contingency tables were calculated using the dichotomised indicators, resulting in the values displayed in Table 1, showing that most of the indicators occurred with significantly more frequency in the FIES inmates.The magnitudes of these differences are shown in the columns that sum up the statistics of the contrasts.
Regarding the twenty-one indicators of violence, eight characterised the FIES group in particular, the majority of those that experts suggested as related to violence of ideological aetiology (V7-8-9-10-15-17), except for V11-19-20 which, although they did demonstrate greater percentages in the FIES group, did not prove to be significant, undoubtedly due to the low frequency of occurrence in both groups.In the subset of indicators of general violence, significant differences were found in V4 and V12.No indicators were found that characterised the non-FIES group.
In turn, within the collection of indicators that the experts suggested as related to radicalism (R), now including the indicators of maintenance of behaviours and those proposed by the CISEG, the SGIP and UNED, it was found that most of them (thirty-nine of forty-three) are significantly more present in the group of FIES.R6 did not prove to be significant (most likely because this evaluation was applied during Islamic Ramadan), and nor did R14-15, or R21.
Upon comparing the degree of presence of the indicators between the different groups, it was found that only ten of the indicators did not differentiate between the groups (see Appendix 2).It can be observed that many of the indicators (46) characterise Group M, and these inmates exhibit the indicators with less frequency than would be expected by chance, showing statistical significance, with the exception of V6, which shows an upward tendency.This is followed in the same vein by the group of S inmates, who reveal an upward tendency in two of the general violence indicators, and a downward tendency in two of the extremist ideological violence indicators.This same downward trend is shared with several of the indicators for radicalization.
In contrast to the non-FIES groups, it had already been observed that the three FIES groups were characterised by a greater degree of presence of the indicators, whereby some differences between them were now observed.Group C exhibited the least significant indicators (sixteen), although all of them rising, some of violence and others of radicalization, especially when they pertain to vulnerability.Group B was characterised by a more numerous collection of indicators (thirty-four), all of which also present a rising tendency.Finally, group A accumulated the greatest number of significant indicators, in particular, seven decreasing indicators of general violence, three of extremist ideological violence, and many of the radicalization indicators (thirty-four) with an upward tendency.

Classificatory scales
Delving into the most important section of this work, to (a) verify whether the selected indicators could be categorised into the subscales conceptually proposed by the team of experts; (b) explore whether other types of groupings or factors existed; and, in any case, (c) study the contributions of each of the indicators to the subscales, the approximation from the IRT to the factor analysis was used for dichotomous data, once the missing values had been imputed.
In the preliminary versions of the DRAVY, the loss of cases was resolved by equating the "not known" to the "absent," assuming that if the evaluators did "not know" the response, it could be because it was not manifested by the inmates.But this assumption is not exempt from bias: it could have been manifested without being observed, which is why it may be useful to turn to other classic alternatives to the method of listwise elimination.One of these, imputation by average or by mode for dichotomous data, reduces the variability in the variables analysed and decreases the typical errors, and is widely discouraged.Therefore, in this instance, the decision was made to carry out a preliminary study of the type of data loss since random loss (MCAR) could not be assumed, thus opting for a method of multiple imputation. 34ollowing the recommendation of Graham et al., 35 who propose a minimum of twenty imputed bases, it was decided to impute thirty databases.To conduct the analysis, the MICE package in R was used, making use of chained equations via binary logistic regression as the method of imputation.After imputing the 30 databases, the substantive model was applied in each of the databases (FA for dichotomous data based on the multidimensional IRT approximation) separately, analysing the scales of violence and radicalization in line with the theoretical framework of these same scales.Once the sixty factorial solutions had been obtained (thirty for the scale of violence and thirty for the scale of radicalization) and inspected to ensure there were no values without meaning (out of range), a single model was reached by averaging the factorial weights. 36ithin the set of indicators of violence, V20 was discarded due to its low contribution (<0.3; see Appendix 3) to all of the resulting factors, with two scales emerging: a set of thirteen indicators made up of the 11 proposed by the literature as being of general violence (GV; V1-2-3-4-5-6-12-13-14-16-17-18-19) as well as two that were in principle thought of as violence of radical aetiology (V17 and V19), with a reliability of KR20 = .784.And a second factor with six indicators of violence of extremistideological aetiology (EIV; V7-8-9-10-11-15), with a reliability of KR20 = .531.
In the set of indicators of radicalization (excluding the nine indicators of behavioural changes), a single classificatory scale of thirty indicators emerged, with R15-20, and R21 being discarded in the first stage, and R17 in the second, due to their low contributions to the factor (<0.3), resulting in the weights displayed in Appendix 4, and a reliability of KR20 = .886.
Substituting the values of presence with the factorial weights, the scores of all the inmates were calculated in each scale, recoding said scores out of 10 to make them more understandable.To establish categories in these empirical scales, different models of cut-off points were explored, paying attention to the percentage of subjects this left in each category and selecting the cut-off points that classified the inmates as shown in Table 2. Due to the high number of inmates who scored 0 in the scale EIV (66.1 percent), it was decided to reduce the number of levels of this scale to three (Not detected, Low and High), whilst five levels were assigned to the scales of GV and R (Not detected, Low, Medium, High, and Extreme).

Scale of the level of danger
In response to the question, What differentiates the "worst FIES" from the rest A, B, C? it was decided to compare the subgroup of inmates of an increased level of danger (thirty-six) with the rest of the FIES inmates in the sample (175), to better identify the indicators that define the profile of the FIES inmates with the highest level of danger.The results are shown in the Appendix 5, where can be observed that twenty-three significant risk indicators emerged (three from the scale of extremist-ideological violence, and twenty from that of radicalism), and one of protection (of general violence, namely V6: this characterised the group of the remaining FIES); and although the R6 indicator proved to be significant, it was disregarded because in the confidence interval of the OR, the unit was contained.These twenty-four indicators were used to construct a new variable of the "level of danger" of each subject, substituting the indicators' values of presence with their OR, and adding up these scores for each subject (except V6, which deducted from the score, since it denoted protection), resulting in a quantitative variable with an average of 9.76, Sd = 11.88,minimum of −1.166, and maximum of 58.34.Comparing this variable with the variable of belonging to the group of FIES inmates with an increased level of danger yielded the following ROC curve, with an area under the curve (AUC) of .817, a standard error of .033and an interval of .751-.888 (see Figure 1).
Keeping in mind the balance between the scores of the thirty-six subjects with an increased level of danger (which fluctuate between 57.172 and 7.950), and the frequencies of the scores of the FIES inmates in this new scale, it was decided to establish a hierarchy of three levels of danger: high, low and not detected.After trying out different combinations, the cut-off point for the "high" level of danger was established as the one that contained half of the subjects with the highest level of danger (26.642), which in turn left 90 percent of the sample below it; and as the cut-off point for the "low" level of danger, the one that left 79 percent of the sample (17.795) below it, which included another twelve subjects of an increased level of danger, meaning that the six inmates of an increased level of danger with a lower score in this scale (16.7 percent) were thus "badly" classified with this model.This selection of cut-off points determined the grouping of FIES inmates as shown Table 3. Dichotomising the risk of the level of danger, considering the levels high and low = yes and the not detected = no, the model correctly classified 83.3 percent of the inmates of an increased level of danger (sensitivity) and 66.9 percent of those who did not exhibit an increased level of danger (specificity); registering a positive predictive value (PPV) of .341and a negative predictive value (NPV) of .951.
Upon applying this model to the 570 inmates in the sample, a further eight non-FIES inmates (from group S) were added as presenting a high level of danger, an additional twenty-three of a low level of danger (twenty-one S and two M), and 328 of danger not detected (197 S and 131 M); so that once again, the model correctly classified 83.3 percent of the inmates with an increased level of danger (sensitivity), and this time also 83.3 percent of those who did not exhibit an elevated level of danger (specificity) (Table 4); PPV = .252;NPV = .987.

Discussion
Terrorism is one of the most relevant international threats nowadays.Noting that prisons are recognized by governments and most academics as a relevant scenario for radicalization leading to violence, it is extremely important to strive for developing systems to detect and prevent these processes.Here we have introduced an instrument to assess the risk of radicalization leading to violence in prisons, the DRAVY-3, developed in Spain.
In this manuscript, we have detailed the process of construction on this instrument and its implementation.The DRAVY-3 was built: (1) supported on other tools already existing; (2) a deep review of the literature on radicalization leading to violence; (3) the experience of the prison staff with the population of interest; (4) indicators suggested by researchers from two relevant institutions, (5) a field study conducted with Muslims inmates (jihadists and non-jihadists); and (6) the results of six previous implementations of two preliminary versions of the instrument including more than 2300 evaluations of inmates from forty different countries.
The final version of the DRAVY-3 includes sixty-three indicators.From these, fifty come from two previous versions of the DRAVY, 37 and thirteen were incorporated based on theoretical and empirical background.The indicators of the DRAVY-3 were applied to the observation of more than 500 cases, including inmates from five different groups, three related with jihadists terrorism (convicted of jihadist terrorism, radicalised persons who proselytise within the prison establishments, and those especially vulnerable to recruitment for subsequent radicalization respectively), and two control groups, (inmates under surveillance who showed incipient signs of radicalization, and Muslim inmates who did not demonstrate any kind of sign of radicalization).
The statistical analyses of these observations showed that fifty-two of these indicators were distributed into three scales of general violence (thirty-two), ideological violence (nine), and radicalism (thirty).Importantly, additional analyses revealed that a combination of items from these three scales, one item of general violence, three items of ideological violence, and seventeen items of radicalism, plus four of the items included in the previous versions of the instrument to register changes in the daily routines or habits, conformed and index of the level of danger of the inmates.This index allowed to identify a few inmates from the control groups with a high level of danger.In this way, the DRAVY-3 is not only an instrument of classification but also includes an index of the level of potential danger of violent jihadist radicalization of the inmates.
The fact that eleven of the sixty-three original indicators were not included in the three scales or in the index of danger does not mean that they should be eliminated from the instrument.The processes of radicalization can vary and some indicators that are not useful now, could be relevant for the future considering that this is a dynamic process.The added value of this investigation is that informs researchers and practitioners a procedure based on a theoretical, methodological, and statistical support to a tool of assessment of the risk of radicalization in prison.
Regarding the previous versions of the instrument, the DRAVY-3 has three main characteristic that merit particular attention, and that entail theoretical and empirical contributions.
First, the new indicators have been introduced based on the judgment of the experts from scientific organizations and on the conclusions obtained from field studies conducted with jihadists terrorist and other criminals (e.g., Gómez et al.). 38econd, the indicators have been tested not only with individuals at different levels of radicalization, but also with different types of control groups.The tools for risk assessment and classification of the subjects also depend on the samples used to validate the indicators.Therefore, the selection of subjects to be evaluated is crucial.In the case of Spanish prisons, having added the evaluations of the "S" subjects has enriched the sample.Since these inmates were already being monitored in the Penitentiary Centres, it made sense to study them despite not being included in the FIES list, due to their value as a control group.
And third, the DRAVY-3 is built on a considerable improvement of the data analyses in particular the new treatment of the missing values (avoiding methodological problems such as the increase in biases and typical errors, as well as the loss of statistical power; Graham et al.), 39 and the use of the factor analysis, verifying whether the indicators formed groups within the theoretically proposed dimensions.In addition, the predictive capacity of a scale of the level of danger was analysed, calculating its sensitivity and specificity.
Considering the DRAVY-3 in comparison with the existing risk assessment tools in this field, and the limitations already summarized in the introductory section, it comprises a series of important strengths.In addition to the second point previously described, as it is that no other tool has been developed and tested supported by empirical data with a large heterogeneous sample, including cases at different levels of radicalization and with a control group, what allows to determine the indicators that distinguish between terrorist and non-terrorist; the DRAVY-3, as compared to other instruments: (1) includes empirical evidence that has been obtained in previous field research (e.g.Gómez et al.), 40 contemplating direct manifestations of the inmates (and therefore, taking into consideration the experiences of the individuals being evaluated, which is fundamental since their perceptions can affect their behaviour); 41 (2) the analyses of the data collected with the instrument have determined three scales that serves as actuarial or structured professional judgment (an important limitation of other tools), and that avoid, or reduce, the heterogeneity of the evaluators, reducing the bias of the observer, and allowing, or helping to, judicial and police decisions based on what to do when an inmate is released (in particular, the scale of dangerousness); and (3) the procedure to build the instrument and the fact that it is implemented periodically reduce the resources necessary to train the evaluators.We also must recognize that one limitation of the DRAVY-3 is that it has been developed in a specific context (Spanish prisons).This means that, perhaps, not all the indicators are valid for prisons in other countries.However, this apparent limitation provides, at the same time, a last advantage as compared to other assessment tools, as it is that (4) the steps to build the DRAVY-3 introduces a strategy to develop and apply an instrument empirically based and theoretically supported, a guideline that could be applied in the prisons of other countries.
In sum, the DRAVY-3 provides a response to the four main conceptual and methodological challenges for the assessment of the risk of terrorism: (1) clarity about what is being assessed; (2) its content is very different to the content of the instruments that address common violence; (3) identifies a solid, individual, evidence-based risk factors; and ( 4) is validated prospectively, with groups of terrorists and non-terrorists of the same populations. 42hen determining whether the indicators served to distinguish between the FIES (radicalised groups) and the non-FIES (controls) inmates, significant differences were found in favour of the FIES group, mainly in the indicators of ideological violence, and in favour in almost all of the indicators of radicalism (39 of 43), but not in those of general violence (except V4 and V12), suggesting that general violence is not a good indicator for violent extremism-note that general violence and radical violence are organized in different dimensions.These findings replicate those we found when using the DRAVY-2 (see Gonzalez-Alvarez et al. 43 for more details).These results also showed that these indicators characterised the FIES and confirmed that these inmates exhibited behaviours of radicalization.These indicators also adequately differentiated between the FIES inmates themselves (i.e groups A, B and C).Inmates in group A exhibited a greater number of indicators, followed by group B and C.These results are compatible with what would be expected, since it is logical that those who are in prison for terrorist offences, and that have behaviourally demonstrated their disposition to commit antisocial behaviours of jihadist aetiology, exhibit more risk indicators which means empirical support for the classificatory system of the SGIP.
The multivariate analyses of the indicators of the DRAVY confirmed that they continued to form groups in a significant manner within the classificatory scales or dimensions (R and V) found in the previous applications 44 with adequate levels of reliability.And this study also enabled the weighting of each of the indicators, including the new ones, with the result that the analysis of the data showed that indicators with strong theoretical support were not empirically significant when it came to describing the studied groups, since it was found that some presented very low frequencies of occurrence or did not achieve a level of significance, which is why their incorporation into the definitive scales was challenged.Nonetheless, although there are indicators that are not prevalent, when they do appear they significantly increase the risk according to expert judgment.This is why, due to the criminal and penitentiary implications it entails, and the specialized literature on risk assessment (e.g.see Douglas & Otto), 45 the assessment of the individuals should not be based exclusively on the results yielded by an instrument, and the judgment of the experts (based on observations, interviews, etc) should be considered.Therefore, following the strategy of risk assessment via structured professional judgment, the indicators that are less prevalent will be kept as part of the DRAVY-3 even if they do not score points in the scales, so that the experts are able to enrich their decisions by drawing on both the empirical (actuarial) scores as well as this additional qualitative information.
Regarding the concerns as to whether the indicators of changes in daily routines (H) could be definitively incorporated into any of the scales, upon dichotomising them in accordance to maintenance (the start and cessation thereof are highly infrequent), the bivariate analyses showed significant differences, which is why it was decided to incorporate them into the scale of the level of danger, and it is considered that they also serve to qualify the indicator R16 ("persistence of behaviour despite labelling"), under the already discussed approach of structured professional judgment, which also represents a step forward in terms of the previous versions of the instrument.
As we pointed out previously, an important contribution of the present investigation is that it has a certain predictive, and not only merely classificatory, capacity.The ideal solution would have been to construct a predictive scale of the recidivism of terrorist behaviours of jihadist aetiology.However, (fortunately) there are not enough cases in this respect in Spain, such that this study had to attempt to predict the likelihood that the inmates would resemble the most dangerous set of inmates in this area of criminality.To do so, the scale of the level of danger was constructed using the predictive statistical parameters (AUC, sensitivity, specificity, positive predictive value and negative predictive value), comparing the FIES of an increased level of danger with the rest of the FIES, since this enables the identification of the indicators that best characterise the inmates with an increased level of danger.In this sense, in the prison context it seems more convenient to identify the greatest number possible of the most dangerous inmates, even though this may slightly increase the number of false positives, which is why it was considered fitting to prioritise sensitivity over specificity, and a high negative predictive value.Just as shown in the analyses, the predictive parameters of the scale of the level of danger proved to be very good, and in fact even better when they were applied to the entire sample and not just the FIES; this also improved the results in previous versions of the DRAVY.These are results that empirically prove that it is possible to distinguish the inmates of the greatest level of danger from the rest of the FIES using a specific collection of characteristic indicators.Considering these results, this predictive index may also help to identify inmates who, without being classified as FIES, exhibit characteristics like those of the inmates with an increased level of danger, which could be interpreted as an early warning system that alerts the prison staff that they should more closely observe subjects who appear to be going unnoticed in case it is appropriate to classify them as FIES.
To the best of our knowledge, this is the first time that the entire process of empirical construction and validation of an instrument of risk assessment is shown in the context of radicalization leading to violence of a jihadist nature, given that no similar studies of the other instruments are known at an international level. 46rom an applied point of view, this work has important contributions since the analyses implemented demonstrate that the DRAVY-3 is a valid and reliable instrument that meets all of the necessary requirements in terms of the methodology and theoretical foundation.The quality of the instrument is paramount since the results of its application influence the decision-making during the enforcement of the sentence, in the treatment programmes applied, and in the coordination that needs to be carried out by various security agencies of the State when the inmates are released.Thus, it may be concluded that the Prison Administration has access to an instrument that is suitable for detecting, treating, and preventing violent radicalism of jihadists aetiology in the prison setting, just as is established by the ENCOT in its strategic guidelines.
As happens with alternative instrument for assessing the risk of radicalization leading to violence in prisons, the DRAVY-3 is not free of limitations.First, in the process of the construction of the instrument, and for the group of inmates incarcerated for jihadists terrorism (group A), the observations were usually made for the same participants because there are few cases of jihadist terrorism in Spain, which are moreover of low intensity, and very few new inmates were admitted to the penitentiary centres between the different applications of the instrument.Second, despite using not just a sample, but rather the entire bandwidth of cases in each study, the inmates' stay in prison introduces a variable that is difficult to control, given that their observable behaviours may change, thus influencing their scores in the indicators.This means that with regard to the classificatory scale, it is methodologically impossible to isolate a true normative group for which to calculate truly significant parameters that would serve to establish generalisable standards in the long term according to the classic psychometric criteria.The data from the sixth application of the instrument that is used in this study is nothing more than a new "snapshot," compiled with five groups of subjects (A, B, C, S and M) that will be used as a template or yardstick from now on.Third, it has already been mentioned that the lack of recidivism hinders the construction of a scale that is totally able to predict the risk that an inmate who is convicted of terrorism will exercise this type of violence again upon leaving prison (e.g.). 47And fourth, the low number of women involved in terrorism in Spain means that they are systematically left out of the DRAVY, which is only useful for evaluating men.
All these limitations, among others, might seem to restrict the use of the DRAVY-3 in the penitentiary centres of other countries.The dissimilarities between nations, especially in the type of inmates, the cultural differences, and the kind of terrorist activity, mean that this instrument cannot have the same usefulness in all territories.However, we consider that one of the strongest

Table 1 .
DRAVY-3 indicators comparison between FIES and no FIES inmates

Table 2 .
Cut-off points (CP) of the scales of violence (GV and EIV) and radicalization (R) (N = 570), and distribution of inmates who classify at each level

Table 3 .
Distribution of FIES inmates according to their risk of dangerousness (n = 211)

Table 4 .
Distribution of the inmates included in the study according to their risk of dangerousness (n = 570)