INDEX
Explanations
references to the word "its" and variants of it
New Auto-Interp
Negative Logits
pleaſure
-0.53
Preference
-0.50
IntoConstraints
-0.49
preference
-0.48
choice
-0.45
preferences
-0.44
experience
-0.44
perſon
-0.44
nadzieję
-0.44
Preference
-0.43
POSITIVE LOGITS
contents
0.81
nahilalakip
0.70
origins
0.69
inception
0.68
entirety
0.67
inhabitants
0.65
occupants
0.65
predecessor
0.63
namesake
0.62
contents
0.61
Activations Density 0.553%