INDEX
Explanations
specific non-English or specialized characters and scripts
New Auto-Interp
Negative Logits
men
-0.56
fast
-0.55
g
-0.54
my
-0.54
ges
-0.54
-0.53
ff
-0.53
one
-0.53
iste
-0.52
ans
-0.52
POSITIVE LOGITS
itſelf
1.01
étoit
0.93
feroit
0.90
avoient
0.88
ainfi
0.88
autorytatywna
0.86
Majefty
0.85
pleaſure
0.84
Anſ
0.83
moſt
0.83
Activations Density 0.208%