INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
faſt
-0.95
queſta
-0.93
ſelf
-0.88
houſe
-0.81
ſelves
-0.81
Houſe
-0.77
$_(
-0.77
ſch
-0.77
ſta
-0.76
ſche
-0.75
POSITIVE LOGITS
1.41
1.15
1.07
0.98
0.90
0.88
0.87
0.81
0.80
0.79
Activations Density 0.000%
No Known Activations
This feature has no known activations.