INDEX
Explanations
commands or phrases indicating permission or allowance
New Auto-Interp
Negative Logits
edList
-0.18
erland
-0.16
rien
-0.15
edImage
-0.15
ÑĢаÑħ
-0.15
surname
-0.15
ivent
-0.14
edl
-0.14
ventus
-0.14
pectral
-0.14
POSITIVE LOGITS
tings
0.29
ting
0.27
us
0.25
loose
0.24
ÃŃcia
0.22
go
0.22
tres
0.20
TING
0.19
itia
0.19
know
0.19
Activations Density 0.032%