INDEX
Explanations
expressions indicating newness or beginner status
New Auto-Interp
Negative Logits
“
-0.53
roch
-0.47
"
-0.45
ir
-0.43
供
-0.43
prí
-0.43
en
-0.41
zu
-0.40
bij
-0.40
ti
-0.39
POSITIVE LOGITS
myſelf
1.28
Shakspeare
1.06
Efq
0.96
themſelves
0.93
Wikimédia
0.90
reaſon
0.90
himſelf
0.89
Monfieur
0.87
houſe
0.86
EndContext
0.86
Activations Density 0.407%