INDEX
Explanations
phrases related to improving the world or making it a better place
New Auto-Interp
Negative Logits
tein
-0.97
Removal
-0.77
oyal
-0.73
omission
-0.69
roversial
-0.68
caut
-0.67
interval
-0.66
effectiveness
-0.63
persistence
-0.63
itone
-0.63
POSITIVE LOGITS
revolves
0.83
Thumbnail
0.80
engulfed
0.75
anew
0.74
darkened
0.73
wake
0.71
liv
0.71
trillions
0.69
ravaged
0.67
opolis
0.67
Activations Density 0.469%