INDEX
Explanations
different types or categories of objects or concepts
mentions of diversity and different categories or classifications
New Auto-Interp
Negative Logits
arton
-0.64
mobi
-0.61
jew
-0.61
efficiency
-0.61
Brave
-0.58
iron
-0.56
coal
-0.56
ayn
-0.56
Adamant
-0.56
requisite
-0.55
POSITIVE LOGITS
differing
1.00
depending
0.99
different
0.96
imaginable
0.94
paces
0.93
configurations
0.91
sexes
0.85
vying
0.84
varying
0.84
styles
0.83
Activations Density 0.249%