INDEX
Explanations
sequences of repeated characters or whitespace
New Auto-Interp
Negative Logits
desmotivaciones
-0.94
ſche
-0.94
queſta
-0.92
ſta
-0.90
indígen
-0.85
auroit
-0.85
pleaſure
-0.84
pouvoit
-0.84
ſelf
-0.83
avoient
-0.83
POSITIVE LOGITS
0.79
0.71
0.65
0.62
The
0.57
a
0.55
O
0.53
I
0.50
0.50
YS
0.49
Activations Density 0.005%