INDEX
Explanations
expressions of gratitude or appreciation towards people
New Auto-Interp
Negative Logits
curfew
-0.73
Worse
-0.67
Doctrine
-0.67
ãĤ¼ãĤ¦ãĤ¹
-0.65
thing
-0.63
Throne
-0.63
Hide
-0.63
Uniform
-0.61
Catholicism
-0.59
Zionism
-0.59
POSITIVE LOGITS
generously
0.96
contributions
0.95
Cosponsors
0.91
contributors
0.90
rats
0.87
generous
0.86
encour
0.83
generosity
0.82
participating
0.80
contributing
0.80
Activations Density 0.080%