INDEX
Explanations
expressions related to personal aspirations and household dynamics
New Auto-Interp
Negative Logits
reck
-0.15
ariat
-0.14
zek
-0.14
balk
-0.14
uffman
-0.14
LB
-0.14
ienes
-0.13
aths
-0.13
utes
-0.13
Dog
-0.13
POSITIVE LOGITS
ocz
0.16
Terr
0.15
iple
0.15
Terr
0.14
isphere
0.14
acomp
0.13
embargo
0.13
gua
0.13
à¸Ńม
0.13
oka
0.13
Activations Density 0.005%