INDEX
Explanations
words related to the concept of friendship and social connections
New Auto-Interp
Negative Logits
icts
-0.17
eru
-0.17
mada
-0.17
erli
-0.16
erb
-0.15
apan
-0.15
ysa
-0.15
ercise
-0.14
amura
-0.14
wnd
-0.14
POSITIVE LOGITS
bons
0.21
bage
0.20
rahim
0.19
onacci
0.19
ber
0.19
RARY
0.18
bling
0.18
lio
0.17
bles
0.16
blings
0.16
Activations Density 0.029%