INDEX
Explanations
mentions of ideal or perfection
concepts related to ideals and idealism
New Auto-Interp
Negative Logits
bane
-0.83
words
-0.73
ker
-0.70
cloth
-0.69
cano
-0.66
SEE
-0.66
hani
-0.66
issues
-0.65
berries
-0.65
manship
-0.64
POSITIVE LOGITS
istic
1.26
istically
1.23
ized
0.97
imates
0.93
imum
0.86
ization
0.86
ised
0.83
istical
0.80
embodiment
0.80
imately
0.79
Activations Density 0.026%