INDEX
Explanations
phrases related to making the world a better place
concepts related to improving the world or societal betterment
New Auto-Interp
Negative Logits
tein
-0.75
persistence
-0.71
Removal
-0.69
incent
-0.68
omission
-0.68
IOR
-0.67
leakage
-0.63
inelli
-0.62
retention
-0.62
phrase
-0.62
POSITIVE LOGITS
revolves
0.92
Thumbnail
0.81
darkened
0.80
habitable
0.75
inhabited
0.74
hosp
0.74
enslaved
0.72
bends
0.71
ankind
0.70
wake
0.70
Activations Density 0.322%