INDEX
Explanations
positive characteristics and values, such as reliability, friendship, curiosity, justice, and creativity
themes related to tolerance and social justice
New Auto-Interp
Negative Logits
cko
-0.70
herent
-0.69
assed
-0.69
upiter
-0.69
kees
-0.67
riber
-0.66
riad
-0.66
utor
-0.65
thal
-0.65
tale
-0.65
POSITIVE LOGITS
albeit
1.03
etc
0.85
infiltration
0.83
which
0.82
whereby
0.81
aka
0.80
creativity
0.78
tein
0.76
whereas
0.75
namely
0.75
Activations Density 0.493%