INDEX
Explanations
statements asserting the inclusion or cooperation of a specific group within a community
New Auto-Interp
Negative Logits
ravel
-0.71
Tanz
-0.70
ragon
-0.69
happ
-0.65
quer
-0.65
ords
-0.65
iates
-0.64
vet
-0.63
Olymp
-0.63
æ©
-0.63
POSITIVE LOGITS
namely
0.84
"'
0.75
disbelief
0.74
disclaimer
0.73
congratulations
0.72
affirmation
0.71
skepticism
0.70
"...
0.69
Eat
0.69
humility
0.68
Activations Density 0.376%