INDEX
Explanations
recurring phrases that indicate belonging or connection
New Auto-Interp
Negative Logits
PYX
-0.74
Jefus
-0.74
fevere
-0.67
Majefty
-0.64
ValueStyle
-0.64
✭✭
-0.63
itſelf
-0.62
Shakspeare
-0.62
Chriftian
-0.61
onlyOwner
-0.61
POSITIVE LOGITS
OutOf
0.78
whack
0.74
outta
0.72
the
0.66
touch
0.65
reach
0.64
bounds
0.63
nowhere
0.63
context
0.62
">)</
0.62
Activations Density 0.054%