INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
―――――
-1.12
iſt
-1.05
pleaſure
-1.04
auffi
-1.02
ſelf
-1.01
ſche
-1.00
faſt
-0.98
dieß
-0.97
againſt
-0.96
myſelf
-0.95
POSITIVE LOGITS
1.67
mathrm
0.81
$
0.75
K
0.67
T
0.65
S
0.65
0.65
B
0.65
(
0.65
D
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.