INDEX
Explanations
names and terms related to individuals or places
instances of the word "rav" and related names, indicating a focus on violent or destructive behavior
New Auto-Interp
Negative Logits
ntil
-0.77
ADRA
-0.72
Ŀ
-0.71
ware
-0.70
ĨĴ
-0.68
OPER
-0.67
é¾
-0.65
uador
-0.64
ndum
-0.64
mare
-0.63
POSITIVE LOGITS
iol
0.81
inia
0.79
illac
0.78
inian
0.75
iants
0.73
agement
0.72
iasis
0.72
ashtra
0.71
ion
0.71
iator
0.71
Activations Density 0.061%