INDEX
Explanations
phrases related to self-awareness and acknowledgment of societal issues
New Auto-Interp
Negative Logits
unj
-0.15
eniable
-0.14
iber
-0.14
aland
-0.14
ungeons
-0.14
iad
-0.13
lon
-0.13
Tome
-0.13
land
-0.13
zm
-0.13
POSITIVE LOGITS
istrovstvÃŃ
0.16
etas
0.15
holm
0.14
ë°Ķë¡ľ
0.14
å¸ĸ
0.14
ysz
0.14
rab
0.14
iew
0.14
reste
0.13
punk
0.13
Activations Density 0.147%