INDEX
Explanations
strings or sequences related to coding or programming constructs
New Auto-Interp
Negative Logits
myſelf
-0.88
expandindo
-0.88
himſelf
-0.87
sizeCache
-0.86
chofe
-0.86
Spisak
-0.85
ſelf
-0.84
HasForeignKey
-0.84
ISNI
-0.83
raiſ
-0.82
POSITIVE LOGITS
[toxicity=0]
0.52
</strong>
0.48
0.47
ństwa
0.45
...
0.45
orghe
0.44
falen
0.43
-
0.43
otherwise
0.42
</b>
0.42
Activations Density 0.433%