INDEX
Explanations
phrases related to clothing or physical appearance
forms of the word "bare" and its derivations, indicating a focus on exposure or vulnerability
New Auto-Interp
Negative Logits
OD
-0.72
Mit
-0.69
oret
-0.67
engers
-0.67
Gray
-0.65
YS
-0.61
orem
-0.61
Bom
-0.60
ues
-0.59
OY
-0.59
POSITIVE LOGITS
paren
0.90
yout
0.82
tsky
0.80
hot
0.75
fi
0.73
atches
0.72
ctic
0.71
thro
0.70
entin
0.70
mares
0.70
Activations Density 0.011%