INDEX
Explanations
phrases indicating permission or the act of allowing something to happen
New Auto-Interp
Negative Logits
vece
-0.71
\{\\-0.71
Ceinture
-0.60
tagext
-0.59
뀜
-0.57
<?,
-0.57
spørs
-0.57
뀝
-0.56
Webber
-0.56
hoga
-0.56
POSITIVE LOGITS
allow
3.71
Allow
3.70
Allow
3.61
allow
3.43
ALLOW
3.34
allowing
3.19
Allowing
3.18
allowed
3.13
allows
3.09
allowing
2.94
Activations Density 0.173%