INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
the
-0.60
↵↵
-0.57
,
-0.56
in
-0.55
all
-0.54
↵
-0.54
ca
-0.53
p
-0.53
G
-0.52
te
-0.52
POSITIVE LOGITS
Efq
0.99
becauſe
0.97
pleaſure
0.94
AddTagHelper
0.93
Majefty
0.92
^(@)
0.89
Monfieur
0.89
itſelf
0.88
للمعارف
0.86
Theſe
0.86
Activations Density 0.000%
No Known Activations
This feature has no known activations.