INDEX
Explanations
positive personality traits such as being friendly and cheerful
words associated with friendliness and warmth in social interactions
New Auto-Interp
Negative Logits
illion
-0.83
rast
-0.80
liga
-0.80
ahar
-0.79
IGHTS
-0.78
ember
-0.78
anish
-0.75
hner
-0.75
ĸļ
-0.75
iple
-0.74
POSITIVE LOGITS
confines
0.92
friendly
0.84
minded
0.78
Friendly
0.76
lier
0.75
greeting
0.73
liness
0.72
hello
0.72
introdu
0.70
disposition
0.70
Activations Density 0.018%