INDEX
Explanations
content related to popularity or widely recognized subjects
New Auto-Interp
Negative Logits
-0.72
ள்
-0.65
niega
-0.64
Sek
-0.61
ValueStyle
-0.60
τ
-0.60
gms
-0.59
c
-0.58
cu
-0.58
Biss
-0.58
POSITIVE LOGITS
Efq
1.62
pleaſure
1.58
Theſe
1.55
myſelf
1.55
Jefus
1.53
Monfieur
1.47
themſelves
1.47
whoſe
1.42
Majefty
1.42
himſelf
1.41
Activations Density 0.037%