INDEX
Explanations
the beginning and end of structured data or code segments
New Auto-Interp
Negative Logits
un
-0.83
so
-0.81
ha
-0.79
con
-0.79
in
-0.77
of
-0.74
it
-0.74
super
-0.73
pro
-0.73
,
-0.73
POSITIVE LOGITS
myſelf
1.40
Jefus
1.38
himſelf
1.33
ſelves
1.32
ſelf
1.29
itſelf
1.27
Anſ
1.27
Eſ
1.26
Houſe
1.26
Efq
1.25
Activations Density 0.100%