INDEX
Explanations
references to attributes or qualities related to various topics
New Auto-Interp
Negative Logits
avoient
-1.15
auroit
-1.12
feroit
-1.09
pouvoit
-1.09
ainfi
-1.09
Efq
-1.08
Majefty
-1.08
zoude
-1.08
myſelf
-1.07
quæ
-1.07
POSITIVE LOGITS
̣c
0.79
0.62
(
0.61
a
0.61
[toxicity=0]
0.60
́i
0.58
?
0.57
,
0.57
Ro
0.55
s
0.54
Activations Density 0.161%