INDEX
Explanations
appeals for financial support or donations
phrases related to donations and support for causes
New Auto-Interp
Negative Logits
equilibrium
-0.64
Poles
-0.59
hallucinations
-0.57
interpreted
-0.57
surfaces
-0.57
Fou
-0.56
orth
-0.55
Exactly
-0.55
angles
-0.54
pects
-0.54
POSITIVE LOGITS
Subscribe
0.92
coupon
0.87
Patreon
0.86
istration
0.86
bookmark
0.86
subscribe
0.85
ðŁij
0.84
Subscribe
0.84
subscribing
0.82
Membership
0.81
Activations Density 0.347%