INDEX
Explanations
phrases emphasizing self-awareness and self-referential themes
"self" or its variations
words starting with self-
New Auto-Interp
Negative Logits
enschaften
-0.62
geweest
-0.56
ViewFeatures
-0.54
IntoConstraints
-0.50
amazonaws
-0.50
setViewName
-0.50
coû
-0.49
cuer
-0.49
fidélité
-0.48
Glej
-0.48
POSITIVE LOGITS
self
1.01
Self
0.89
SELF
0.83
Self
0.82
ValueStyle
0.82
SELF
0.70
Etr
0.68
Selbst
0.66
Selbst
0.65
introspection
0.64
Activations Density 0.040%