INDEX
Explanations
words related to affection or preference
New Auto-Interp
Negative Logits
purpoſe
-0.94
ſtate
-0.91
pleaſure
-0.91
houſe
-0.88
raiſ
-0.86
Jefus
-0.85
neceff
-0.83
himſelf
-0.83
itſelf
-0.82
poffible
-0.82
POSITIVE LOGITS
__((
0.52
————————————
0.45
########.
0.44
gainera
0.43
]){
0.43
AssemblyTitle
0.43
createContext
0.42
ագրություններ
0.42
а
0.41
ือก
0.41
Activations Density 0.049%