INDEX
Explanations
phrases related to feelings of safety, comfort, and discomfort
emotions related to safety and comfort
New Auto-Interp
Negative Logits
recovered
-0.63
Airl
-0.63
angler
-0.62
odder
-0.62
Appears
-0.61
adra
-0.60
nm
-0.60
}"
-0.58
ummer
-0.58
tein
-0.58
POSITIVE LOGITS
ULAR
0.78
vier
0.63
ient
0.62
Discuss
0.61
tics
0.61
Tur
0.60
ety
0.59
Premium
0.59
âĿ
0.57
uku
0.57
Activations Density 0.091%