INDEX
Explanations
words and phrases indicating actions or directives
New Auto-Interp
Negative Logits
Switcher
-0.76
jetbrains
-0.72
gethan
-0.70
sonne
-0.68
Florentine
-0.68
fubject
-0.67
Shakspeare
-0.67
fisher
-0.66
NDEBUG
-0.65
scattata
-0.65
POSITIVE LOGITS
to
1.58
TO
1.17
To
1.08
be
0.98
να
0.97
To
0.96
make
0.94
zu
0.91
to
0.88
yto
0.87
Activations Density 2.480%