INDEX
Explanations
sequences of repeated special characters
patterns of repeated characters or symbols
New Auto-Interp
Negative Logits
Samar
-0.78
achus
-0.71
ake
-0.62
milo
-0.62
denying
-0.62
Watkins
-0.60
adelphia
-0.59
minds
-0.59
pegged
-0.59
anke
-0.58
POSITIVE LOGITS
--------------------------------------------------------
1.03
--------------------------------
0.89
=-
0.87
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0.86
EGIN
0.80
=-=-=-=-=-=-=-=-
0.77
++++
0.77
----
0.77
âĸł
0.77
---
0.76
Activations Density 0.342%