INDEX
Explanations
phrases related to instructions or commands
occurrences of the word "dos" and variations related to dosage
New Auto-Interp
Negative Logits
ISM
-0.96
Reviewer
-0.80
Immunity
-0.73
gypt
-0.71
ICAN
-0.70
ocene
-0.70
INTON
-0.69
raud
-0.69
WB
-0.67
istically
-0.64
POSITIVE LOGITS
omething
1.29
dos
1.15
Dos
1.03
Santos
1.01
hiba
0.98
wana
0.81
dos
0.80
pec
0.80
ega
0.79
ques
0.79
Activations Density 0.005%