INDEX
Explanations
words related to permission or enabling actions
New Auto-Interp
Negative Logits
iche
-0.15
yt
-0.15
ventus
-0.14
ril
-0.14
alent
-0.14
svp
-0.14
ething
-0.14
aeda
-0.14
ichel
-0.14
hsi
-0.14
POSITIVE LOGITS
us
0.27
ance
0.24
fullscreen
0.23
ances
0.21
him
0.20
for
0.18
flexibility
0.18
them
0.18
/dis
0.18
/disable
0.18
Activations Density 0.051%