INDEX
Explanations
phrases related to safe, supportive, and positive environments
New Auto-Interp
Negative Logits
ubar
-0.16
elmet
-0.15
ÑĹв
-0.15
xis
-0.14
оваÑĢ
-0.14
andon
-0.14
ewitness
-0.13
ucci
-0.13
erp
-0.13
/***/
-0.13
POSITIVE LOGITS
warm
0.29
safe
0.29
atmosphere
0.27
welcoming
0.27
accepting
0.26
atmos
0.26
judgement
0.25
fun
0.25
judgment
0.24
friendly
0.24
Activations Density 0.105%