INDEX
Explanations
structures or formats related to lists and arrays in code
New Auto-Interp
Negative Logits
<em>
-0.79
in
-0.75
-0.73
er
-0.72
en
-0.72
[toxicity=0]
-0.70
z
-0.68
<i>
-0.68
1
-0.67
-
-0.67
POSITIVE LOGITS
myſelf
1.27
themſelves
1.25
himſelf
1.24
poffible
1.21
auffi
1.20
Jefus
1.20
Monfieur
1.19
ainfi
1.15
purpoſe
1.13
neceffary
1.12
Activations Density 0.145%