INDEX
Explanations
references to publication volumes
New Auto-Interp
Negative Logits
myſelf
-0.86
pleaſure
-0.80
wiſe
-0.77
eriksaan
-0.75
Monfieur
-0.74
themſelves
-0.73
himſelf
-0.73
✨:
-0.71
Theſe
-0.68
itſelf
-0.67
POSITIVE LOGITS
Vol
2.56
vol
2.53
Vol
2.45
vol
2.38
VOL
2.25
VOL
2.08
Vols
1.70
vols
1.52
vols
1.35
Volker
1.18
Activations Density 0.032%